--- title: "SDM4 in R: Scatterplots, Association, and Correlation (Chapter 6)" author: "Nicholas Horton (nhorton@amherst.edu) and Sarah McDonald" date: "June 13, 2018" output: pdf_document: fig_height: 3 fig_width: 6 html_document: fig_height: 3 fig_width: 5 word_document: fig_height: 4 fig_width: 6 --- ```{r, include = FALSE} # Don't delete this chunk if you are using the mosaic package # This loads the mosaic and dplyr packages require(mosaic) ``` ```{r, include = FALSE} # knitr settings to control how R chunks work. require(knitr) opts_chunk$set( tidy = FALSE, # display code as typed size = "small" # slightly smaller font for code ) ``` ## Introduction and background This document is intended to help describe how to undertake analyses introduced as examples in the Fourth Edition of *Stats: Data and Models* (2014) by De Veaux, Velleman, and Bock. More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series. This file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4. This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic). A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024. ## Chapter 6: Scatterplots, Association, and Correlation ### Section 6.1: Scatterplots Figure 6.1 (page 152) displays the scatterplot of the average tracking error over time. ```{r message = FALSE} library(mosaic) library(readr) options(digits = 3) Hurricanes <- read_csv("http://nhorton.people.amherst.edu/sdm4/data/Tracking_hurricanes_2012.csv") gf_point(Error72h ~ Year, ylab = "Prediction error (nautical miles)", data = Hurricanes) ``` ### Section 6.2: Correlation Figure 6.2 (page 155) displays the scatterplot of weight vs. height for a sample of students from statistics classes. ```{r message = FALSE} HtWt <- read_csv("http://nhorton.people.amherst.edu/sdm4/data/Heights_and_Weights.csv") gf_point(Weight ~ Height, ylab = "Weight (lbs)", xlab = "Height (in)", data = HtWt) cor(Weight ~ Height, data = HtWt) ``` #### Kendall's Tau and Spearman's Rho ```{r} cor(Weight ~ Height, method = "kendall", data = HtWt) cor(Weight ~ Height, method = "spearman", data = HtWt) ``` ### Section 6.3: Warning: Correlation does not always equal Causation ### Section 6.4: Straightening scatterplots Since the dataset is so small for Figure 6.10 (page 165) we can enter it by hand. ```{r} fstop <- c(2.8, 4, 5.6, 8, 11, 16, 22, 32) shutter <- c(1/1000, 1/500, 1/250, 1/125, 1/60, 1/30, 1/15, 1/8) lenses <- data.frame(fstop, shutter) gf_point(fstop ~ shutter, ylab = "f/stop", xlab = "Shutter Speed (sec)", data = lenses) ``` A new transformed variable can be added using the `mutate` function. ```{r} lenses <- mutate(lenses, fstopsq = fstop * fstop) gf_point(fstopsq ~ shutter, ylab = "f/stop (squared)", xlab = "Shutter Speed (sec)", data = lenses) ```