This document is intended to help describe how to undertake analyses introduced as examples in the Fourth Edition of (2014) by De Veaux, Velleman, and Bock. More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series. This file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4.
This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the mosaic
package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic).
Let’s begin by reproducing the figure on the bottom of page 519.
mu <- 1309
sd <- 15.7
xpnorm(c(mu-3*sd, mu-2*sd, mu-sd, mu+sd, mu+2*sd, mu+3*sd), mean=mu, sd=sd)
##
## If X ~ N(1309, 15.7), then
##
## P(X <= 1262) = P(Z <= -3) = 0.00135
## P(X <= 1278) = P(Z <= -2) = 0.02275
## P(X <= 1293) = P(Z <= -1) = 0.15866
## P(X <= 1325) = P(Z <= 1) = 0.84134
## P(X <= 1340) = P(Z <= 2) = 0.97725
## P(X <= 1356) = P(Z <= 3) = 0.99865
## P(X > 1262) = P(Z > -3) = 0.99865
## P(X > 1278) = P(Z > -2) = 0.97725
## P(X > 1293) = P(Z > -1) = 0.84134
## P(X > 1325) = P(Z > 1) = 0.15866
## P(X > 1340) = P(Z > 2) = 0.02275
## P(X > 1356) = P(Z > 3) = 0.00135
## [1] 0.00135 0.02275 0.15866 0.84134 0.97725 0.99865
Figure 20.1 (page 521) displays a normal curve (dashed green curve) and a t-model with 2 degrees of freedom (solid blue curve).
plotDist("norm", lty=2, col="green", lwd=2)
plotDist("t", params=2, lty=1, lwd=2, col="blue", add=TRUE)
We can reproduce the calculations for the Farmed salmon example (pages 523-524) using summary statistics:
n <- 150; ybar <- 0.0913; s = 0.0495
tstar <- qt(0.975, df=n-1); tstar
## [1] 1.98
ybar + c(-tstar, tstar)*s/sqrt(n)
## [1] 0.0833 0.0993
or directly:
Salmon <- read.csv("http://nhorton.people.amherst.edu/sdm4/data/Farmed_Salmon.csv")
favstats(~ Mirex, data=Salmon)
## min Q1 median Q3 max mean sd n missing
## 0 0.056 0.079 0.135 0.194 0.0913 0.0495 150 3
histogram(~ Mirex, width=0.01, center=0.01/2, data=Salmon)
t.test(~ Mirex, data=Salmon)
##
## One Sample t-test
##
## data: Salmon$Mirex
## t = 20, df = 100, p-value <2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.0833 0.0993
## sample estimates:
## mean of x
## 0.0913
We note that the distribution of measurements is not particularly normal.
We can carry out the one-sided test outlined on page 530:
tval <- (.0913-0.08)/0.0040; tval
## [1] 2.83
1-xpt(tval, df=149)
## [1] 0.00269