--- title: "SDM4 in R: Scatterplots, Association, and Correlation (Chapter 6)" author: "Nicholas Horton (nhorton@amherst.edu)" date: "January 2, 2017" output: html_document: fig_height: 5 fig_width: 7 pdf_document: fig_height: 3 fig_width: 5 word_document: fig_height: 4 fig_width: 6 --- ```{r, include=FALSE} # Don't delete this chunk if you are using the mosaic package # This loads the mosaic and dplyr packages require(mosaic) ``` ```{r, include=FALSE} # Some customization. You can alter or delete as desired (if you know what you are doing). # This changes the default colors in lattice plots. trellis.par.set(theme=theme.mosaic()) # knitr settings to control how R chunks work. require(knitr) opts_chunk$set( tidy=FALSE, # display code as typed size="small" # slightly smaller font for code ) ``` ## Introduction and background This document is intended to help describe how to undertake analyses introduced as examples in the Fourth Edition of \emph{Stats: Data and Models} (2014) by De Veaux, Velleman, and Bock. More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series. This file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4. This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic). ## Chapter 6: Scatterplots, Association, and Correlation ### Section 6.1: Scatterplots Figure 6.1 (page 152) displays the scatterplot of the average tracking error over time. ```{r message=FALSE} library(mosaic); library(readr) options(digits=3) Hurricanes <- read_csv("http://nhorton.people.amherst.edu/sdm4/data/Tracking_hurricanes_2012.csv") xyplot(Error72h ~ Year, ylab="Prediction error (nautical miles)", data=Hurricanes) ``` ### Section 6.2: Correlation Figure 6.2 (page 155) displays the scatterplot of weight vs. height for a sample of students from statistics classes. ```{r message=FALSE} HtWt <- read_csv("http://nhorton.people.amherst.edu/sdm4/data/Heights_and_Weights.csv") xyplot(Weight ~ Height, ylab="Weight (lbs)", xlab="Height (in)", data=HtWt) cor(Weight ~ Height, data=HtWt) ``` #### Kendall's Tau and Spearman's Rho ```{r} cor(Weight ~ Height, method="kendall", data=HtWt) cor(Weight ~ Height, method="spearman", data=HtWt) ``` ### Section 6.3: Warning: Correlation does not always equal Causation ### Section 6.4: Straightening scatterplots Since the dataset is so small for Figure 6.10 (page 165) we can enter it by hand. ```{r} fstop <- c(2.8, 4, 5.6, 8, 11, 16, 22, 32) shutter <- c(1/1000, 1/500, 1/250, 1/125, 1/60, 1/30, 1/15, 1/8) lenses <- data.frame(fstop, shutter) xyplot(fstop ~ shutter, ylab="f/stop", xlab="Shutter Speed (sec)", data=lenses) ``` A new transformed variable can be added using the `mutate` function. ```{r} lenses <- mutate(lenses, fstopsq = fstop*fstop) xyplot(fstopsq ~ shutter, ylab="f/stop (squared)", xlab="Shutter Speed (sec)", data=lenses) ```