--- title: "SDM4 in R: Analysis of Variance (Chapter 26)" author: "Nicholas Horton (nhorton@amherst.edu) and Sarah McDonald" date: "June 13, 2018" output: pdf_document: fig_height: 2.8 fig_width: 7 html_document: fig_height: 3 fig_width: 5 word_document: fig_height: 4 fig_width: 6 --- ```{r, include = FALSE} # Don't delete this chunk if you are using the mosaic package # This loads the mosaic and dplyr packages require(mosaic) options(digits = 5) ``` ```{r, include = FALSE} # knitr settings to control how R chunks work. require(knitr) opts_chunk$set( tidy = FALSE, # display code as typed size = "small" # slightly smaller font for code ) ``` ## Introduction and background This document is intended to help describe how to undertake analyses introduced as examples in the Fourth Edition of *Stats: Data and Models}* (2014) by De Veaux, Velleman, and Bock. More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series. This file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4. This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic). A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024. ## Chapter 26: Analysis of variance ### Section 26.1: Testing whether the means of several groups are zero The graph in Figure 26.1 (page 747) can be generated using the `bwplot()` function. ```{r} Soap <- read.csv("http://nhorton.people.amherst.edu/sdm4/data/Bacterial_Soap.csv") gf_boxplot(Bacterial.Counts ~ Method, data = Soap) ``` The example on page 750 considers the outcomes in hand volumes for three treatments post surgery. ```{r warning = FALSE} Contrast <- read.csv("http://nhorton.people.amherst.edu/sdm4/data/Contrast_baths.csv") gf_boxplot(Hand.Vol.Chg ~ Treatment, data = Contrast) ``` The summary statistics at the bottom of page 751 can be calculated using `favstats()`. ```{r} favstats(Bacterial.Counts ~ Method, data = Soap) ``` ### Section 26.2: The ANOVA table The `aov()` function can be used to fit an analysis of variance model. ```{r} aovmod <- aov(Bacterial.Counts ~ Method, data = Soap) summary(aovmod) ``` This model has 3 degrees of freedom for the model (numerator) and 28 degrees of freedom for the error (denominator). The `xpf()` function can replicate the calculation of the exact p-value (and generate Figure 26.4, page 754). ```{r} xpf(7.0636, df1 = 3, df2 = 28) ``` The treatment means can be generated using `model.tables()` (see page 757). ```{r} model.tables(aovmod) ``` The residual standard deviation can be calculated (page 759). ```{r} n <- 32 k <- 4 sp <- sqrt(sum(resid(aovmod)^2/(n-k))) sp sqrt(1410) ``` We can also see how the results are equivalent when fitting a regression model with indicators. ```{r} lmmod <- lm(Bacterial.Counts ~ Method, data = Soap) msummary(lmmod) ```