---
title: "Analysis of Variance (Chapter 24)"
author: "Patrick Frenett, Vickie Ip, Sarah McDonald, and Nicholas Horton (nhorton@amherst.edu)"
date: "June 29, 2018"
output:
pdf_document:
fig_height: 4
fig_width: 6
html_document:
fig_height: 3
fig_width: 5
word_document:
fig_height: 4
fig_width: 6
---
```{r, include = FALSE}
# Don't delete this chunk if you are using the mosaic package
# This loads the mosaic and dplyr packages
require(mosaic)
```
```{r, include = FALSE}
# knitr settings to control how R chunks work.
require(knitr)
opts_chunk$set(
tidy = FALSE, # display code as typed
size = "small", # slightly smaller font for code
fig.algin = "center"
)
```
## Introduction and background
This document is intended to help describe how to undertake analyses introduced
as examples in the Fourth Edition of *Intro Stats* (2013) by De Veaux, Velleman, and Bock.
More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series. This
file as well as the associated R Markdown reproducible analysis source file used to create it can be found at https://nhorton.people.amherst.edu/is4.
This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic). A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024.
Note that some of the figures in this document may differ slightly from those in the IS4 book due to small differences in datasets. However in all cases the analysis and techniques in R are accurate.
## Chapter 24: Analysis of variance
### Section 24.1: Testing whether the means of several groups are zero
The graph in Figure 24.1 (page 701) can be generated using the `gf_boxplot()` function.
```{r message = FALSE}
Soap <- read.csv("https://nhorton.people.amherst.edu/sdm4/data/Bacterial_Soap.csv")
gf_boxplot(Bacterial.Counts ~ Method, data = Soap, col = "royalblue2")
```
The example on page 704 considers the outcomes in hand volumes for three treatments
post surgery.
```{r warning = FALSE}
Contrast <- read.csv("https://nhorton.people.amherst.edu/sdm4/data/Contrast_baths.csv")
gf_boxplot(Hand.Vol.Chg ~ Treatment, data = Contrast, col = "lightsteelblue4")
```
The summary statistics at the bottom of page 705 can be calculated using `favstats()` (or `df_stats()`).
```{r}
favstats(Bacterial.Counts ~ Method, data = Soap)
```
### Section 24.2: The ANOVA table
The `aov()` function can be used to fit an analysis of variance model.
```{r}
aovmod <- aov(Bacterial.Counts ~ Method, data = Soap)
summary(aovmod)
```
This model has 3 degrees of freedom for the model (numerator) and 28 degrees of
freedom for the error (denominator). The `xpf()` function can replicate the calculation of the exact p-value (and generate Figure 24.4, page 708).
```{r}
xpf(7.0636, df1 = 3, df2 = 28)
```
The treatment means can be generated using `model.tables()` (see page 711).
```{r}
model.tables(aovmod)
```
The residual standard deviation can be calculated (page 713).
```{r}
n <- 32
k <- 4
sp <- sqrt(sum(resid(aovmod)^2/(n - k)))
sp
sqrt(1410)
```
We can also see how the results are equivalent when fitting a regression model
with indicators.
```{r}
lmmod <- lm(Bacterial.Counts ~ Method, data = Soap)
msummary(lmmod)
```
### Section 24.3: Assumptions and Conditions
A box plot of the residuals shown by figure 24.5 on page 715 can be generated with the `gf_boxplot()` function. Figure 24.6 is made using
the `gf_qq()` function.
```{r}
gf_boxplot(resid(lmmod) ~ Method, data = Soap)
gf_qq(~ resid(lmmod), data = Soap, col = "lightcoral") %>%
gf_labs(x = "qnorm", y = "resid(lmmod)")
```