This document is intended to help describe how to undertake analyses introduced as examples in the Fourth Edition of (2014) by De Veaux, Velleman, and Bock. More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series. This file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4.
This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the mosaic
package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic).
See Figure 4.1 on page 85
library(mosaic); library(readr)
options(digits=3)
Hopkins <-
read_delim("http://nhorton.people.amherst.edu/sdm4/data/Hopkins_Forest_2011.txt", delim="\t")
names(Hopkins)
## [1] "Season" "AvgWindSpeed" "Month" "Day"
## [5] "DayofYear" "AvgTempC" "AvgTempF" "MaxWindSpeed"
## [9] "AvgBarom" "Precip"
histogram(~ AvgWindSpeed, width=0.5, center=0.24,
xlab="Average wind speed (mph)", data=Hopkins)
Here we reproduce Figure 4.2 on page 85
Hopkins <- mutate(Hopkins,
Summer = Month >= 4 & Month <= 9,
Winter = !Summer
)
histogram(~ AvgWindSpeed, width=0.5, center=0.24,
xlab="Average summer wind speed (mph)", data=filter(Hopkins, Summer==TRUE))
favstats(~ AvgWindSpeed, data=filter(Hopkins, Summer==TRUE))
## min Q1 median Q3 max mean sd n missing
## 0 0.35 0.71 1.62 5.47 1.11 1.1 183 0
histogram(~ AvgWindSpeed, width=0.5, center=0.24,
xlab="Average winter wind speed (mph)", data=filter(Hopkins, Winter==TRUE))
favstats(~ AvgWindSpeed, data=filter(Hopkins, Winter==TRUE))
## min Q1 median Q3 max mean sd n missing
## 0.02 0.84 1.72 2.66 6.73 1.9 1.29 182 0
Here we reproduce Figure 4.3 on page 87
bwplot(AvgWindSpeed ~ as.factor(Month), data=Hopkins)
filter(Hopkins, Month==2, AvgWindSpeed > 6) # in February
## # A tibble: 1 × 12
## Season AvgWindSpeed Month Day DayofYear AvgTempC AvgTempF MaxWindSpeed
## <chr> <dbl> <int> <int> <int> <dbl> <dbl> <dbl>
## 1 Winter 6.73 2 19 50 -5.09 22.8 39.5
## # ... with 4 more variables: AvgBarom <dbl>, Precip <dbl>, Summer <lgl>,
## # Winter <lgl>
filter(Hopkins, Month==6, AvgWindSpeed > 3.9) # in June
## # A tibble: 1 × 12
## Season AvgWindSpeed Month Day DayofYear AvgTempC AvgTempF MaxWindSpeed
## <chr> <dbl> <int> <int> <int> <dbl> <dbl> <dbl>
## 1 Summer 3.93 6 2 153 14.7 58.5 38.8
## # ... with 4 more variables: AvgBarom <dbl>, Precip <dbl>, Summer <lgl>,
## # Winter <lgl>
See Figures 4.4 through 4.6 starting on page 92
xyplot(AvgWindSpeed ~ DayofYear, data=Hopkins)
xyplot(AvgWindSpeed ~ DayofYear, type="l", data=Hopkins)
xyplot(AvgWindSpeed ~ DayofYear, type=c("p", "smooth"), lwd=3, data=Hopkins)
See Figure 4.7 on page 94
CEO <- read_delim("http://nhorton.people.amherst.edu/sdm4/data/CEO_Salary_2012.txt", delim="\t")
favstats(~ One_Year_Pay, data=CEO)
## min Q1 median Q3 max mean sd n missing
## 0 3.88 6.97 13.4 131 10.5 11.5 500 0
histogram(~ One_Year_Pay, width=2.5, center=1.24, data=CEO)
Figure 4.8 on page 95
nrow(CEO) # let's get rid of the CEO's with 0 salaries...
## [1] 500
CEO <- filter(CEO, One_Year_Pay > 0)
nrow(CEO)
## [1] 497
CEO <- mutate(CEO, log10salary = log10(One_Year_Pay*1000000))
histogram(~ log10salary, width=.25, center=.124, data=CEO)
On the log 10 scale, we can roughly interpret the values as the number of digits in the CEO salary.