\documentclass[11pt]{article}

\usepackage[margin=1in,bottom=.5in,includehead,includefoot]{geometry}
\usepackage{hyperref}
\usepackage{language}
\usepackage{alltt}
\usepackage{fancyhdr}
\pagestyle{fancy}
\fancyhf{}

%% Now begin customising things. See the fancyhdr docs for more info.

\chead{}
\lhead[\sf \thepage]{\sf \leftmark}
\rhead[\sf \leftmark]{\sf \thepage}
\lfoot{}
\cfoot{Introduction to the Practice of Statistics using R: Chapter 12}
\rfoot{}

\newcounter{myenumi}
\newcommand{\saveenumi}{\setcounter{myenumi}{\value{enumi}}}
\newcommand{\reuseenumi}{\setcounter{enumi}{\value{myenumi}}}

\pagestyle{fancy}

\def\R{{\sf R}}
\def\Rstudio{{\sf RStudio}}
\def\RStudio{{\sf RStudio}}
\def\term#1{\textbf{#1}}
\def\tab#1{{\sf #1}}


\usepackage{relsize}

\newlength{\tempfmlength}
\newsavebox{\fmbox}
\newenvironment{fmpage}[1]
     {
   \medskip
   \setlength{\tempfmlength}{#1}
	 \begin{lrbox}{\fmbox}
	   \begin{minipage}{#1}
		 \vspace*{.02\tempfmlength}
		 \hfill
	   \begin{minipage}{.95 \tempfmlength}}
		 {\end{minipage}\hfill
		 \vspace*{.015\tempfmlength}
		 \end{minipage}\end{lrbox}\fbox{\usebox{\fmbox}}
	 \medskip
	 }


\newenvironment{boxedText}[1][.98\textwidth]%
{%
\begin{center}
\begin{fmpage}{#1}
}%
{%
\end{fmpage}
\end{center}
}

\newenvironment{boxedTable}[2][tbp]%
{%
\begin{table}[#1]
  \refstepcounter{table}
  \begin{center}
\begin{fmpage}{.98\textwidth}
  \begin{center}
	\sf \large Box~\expandafter\thetable. #2
\end{center}
\medskip
}%
{%
\end{fmpage}
\end{center}
\end{table}		% need to do something about exercises that follow boxedTable
}


\newcommand{\cran}{\href{http://www.R-project.org/}{CRAN}}

\title{Introduction to the Practice of Statistics using R: \\
Chapter 12}

\author{
Nicholas J. Horton\thanks{Department of Mathematics, Amherst College, nhorton@amherst.edu} \and Ben Baumer
} 

\date{\today}

\begin{document}


\maketitle
\tableofcontents

%\parindent=0pt


<<setup, include=FALSE, cache=FALSE>>=
opts_chunk$set(
  dev="pdf",
  tidy=FALSE,
  fig.path="figures/",
	fig.height=4,
	fig.width=5,
	out.width=".57\\textwidth",
	fig.keep="high",
	fig.show="hold",
	fig.align="center",
	prompt=TRUE,  # show the prompts; but perhaps we should not do this 
	comment=NA    
  )
options(continue=" ")
@

<<pvalues, echo=FALSE, message=FALSE>>=
print.pval = function(pval) {
  threshold = 0.0001
    return(ifelse(pval < threshold, paste("p<", sprintf("%.4f", threshold), sep=""),
                ifelse(pval > 0.1, paste("p=",round(pval, 2), sep=""),
                       paste("p=", round(pval, 3), sep=""))))
}
@

<<setup2,echo=FALSE,message=FALSE>>=
require(mosaic)
trellis.par.set(theme=col.mosaic())  # get a better color scheme for lattice
set.seed(123)
# this allows for code formatting inline.  Use \Sexpr{'function(x,y)'}, for exmaple.
knit_hooks$set(inline = function(x) {
if (is.numeric(x)) return(knitr:::format_sci(x, 'latex'))
x = as.character(x)
h = knitr:::hilight_source(x, 'latex', list(prompt=FALSE, size='normalsize'))
h = gsub("([_#$%&])", "\\\\\\1", h)
h = gsub('(["\'])', '\\1{}', h)
gsub('^\\\\begin\\{alltt\\}\\s*|\\\\end\\{alltt\\}\\s*$', '', h)
})
showOriginal=FALSE
showNew=TRUE
@ 

\section*{Introduction}

This document is intended to help describe how to undertake analyses introduced as examples in the Sixth Edition of \emph{Introduction to the 
Practice of Statistics} (2002) by David Moore, George McCabe and Bruce
Craig.
More information about the book can be found at \url{http://bcs.whfreeman.com/ips6e/}.  This
file as well as the associated \pkg{knitr} reproducible analysis source file can be found at
\url{http://www.math.smith.edu/~nhorton/ips6e}.

This work leverages initiatives undertaken by Project MOSAIC (\url{http://www.mosaic-web.org}), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the 
\pkg{mosaic} package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignette (\url{http://cran.r-project.org/web/packages/mosaic/vignettes/MinimalR.pdf}).

Additional examples of fitting multiple regression models can be found in the companion site which implements the examples within \emph{The Statistical Sleuth} in R (\url{http://www.math.smith.edu/~nhorton/sleuth}).

To use a package within R, it must be installed (one time), and loaded (each session). The 
packages can be installed using the following command:
<<install_mosaic,eval=FALSE>>=
install.packages('mosaic')               # note the quotation marks
install.packages('gmodels')               # note the quotation marks
@
The {\tt \#} character is a comment in R, and all text after that on the 
current line is ignored.

Once the package is installed (one time only), it can be loaded by running the command:
<<load_mosaic,eval=FALSE>>=
require(mosaic)
require(gmodels)
@
This
needs to be done once per session.

We also set some options to improve legibility of graphs and output.
<<eval=TRUE>>=
trellis.par.set(theme=col.mosaic())  # get a better color scheme for lattice
options(digits=3)
@

The specific goal of this document is to demonstrate how to replicate the
analysis described in Chapter 12: One-Way Analysis of Variance.

\section{Inference for One-way ANOVA}

\subsection{Exploratory analysis}

We consider the case study on workplace safety introduced on page 641 (Example 12.3).

<<>>=
ds = read.csv("http://www.math.smith.edu/ips6e/Ch12/ex12_003.csv")
favstats(SCI ~ jobcat, data=ds)
@
Variants of the
graphical displays (from Figure 12.3, page 642)
are reproduced below.  Note that the histograms (with overlaid normal curve) can be generated using separate stacked figures, or a single display can be created using overlapping density plots.
<<>>=
histogram(~ SCI| jobcat, fit="normal", layout=c(1, 3), data=ds)
@
<<>>=
densityplot(~ SCI, groups=jobcat, auto.key=TRUE, data=ds)
@
<<>>=
bwplot(SCI ~ jobcat, data=ds)
@

\subsection{Pooled standard deviation}

The pooled standard deviation can be easily calculated through the {\tt lm()} command:
<<>>=
ex12.5 = lm(SCI ~ jobcat, data=ds)
summary(ex12.5)
@
The value of 18.07 matches the results in Example 12.5 (page 647).  

\subsection{ANOVA table}

The ANOVA table (Figure 12.8, page 649) can be generated from this linear model
object.
<<>>=
anova(ex12.5)
@

\subsection{Decomposition}

As always, the total variability (SST) can be decomposed into part explained by the model (SSM or SSG, as described in Example 12.9, page 651) and part unexplained (SSE, or sums of squares for error).  

<<>>=
meanval = mean(~ SCI, data=ds)
SST = with(ds, sum((SCI - meanval)^2))
SST
SSM = sum((fitted(ex12.5) - meanval)^2)
SSM
SSE = sum((residuals(ex12.5)^2))
SSE
SSM + SSE
@

We can use these results to verify the value of $s_p$ (the pooled estimate of the parameter $\sigma$) from our model.  
<<>>=
MSE = SSE / (nrow(ds) - 2 - 1); MSE
sqrt(MSE)
@
This matches the value on page 652 (Example 12.11).  

\subsection{The F test}

The F distribution is used to test the overall hypotheses (and other multiple degree of 
freedom tests).  The p-value is the probability that a random variable having the $F(I-1, N-I)$ distribution is greater or equal to the calculated value of the F statistic.  The values from Example 12.12 (page 653) can be found using the {\tt qf()} function:
<<>>=
qf(c(.90, .95, .975, .99, .999), df1 = 2, df2 = 587)
@
(Note that the values in the book are only available for denominator degrees of freedom equal to 200, so the results are conservative).  

\subsection{Coefficient of determination}

As usual, the $R^2$ (or coefficient of determination) can be calculated in multiple ways:
<<>>=
r.squared(ex12.5)
SSM/SST
@

\section{Comparing the means}

\subsection{Contrasts}

Contrasts can be used to calculate specific one degree of freedom tests of hypotheses.  
Recall the means from the worker data:
<<>>=
mean(SCI ~ jobcat, data=ds)
@
We can also calculate these in terms of the regression parameter estimates:
<<>>=
mycoef = coef(ex12.5); mycoef
mycoef[1]
mycoef[1] + mycoef[2]
mycoef[1] + mycoef[3]
@

Contrasts can be fit using the {\tt fit.contrast()} function within the {\tt gmodels} package.
<<>>=
require(gmodels)
fit.contrast(ex12.5, "jobcat", c(-1/2, 1, -1/2))
@
This matches the results for the first contrast (Example 12.18, pages 658--659).  

A similar process is used to test the second contrast (Example 12.20, page 659):
<<>>=
fit.contrast(ex12.5, "jobcat", c(-1, 0, 1))
@

These values can be used to calculate a 95\% confidence interval for the difference in means:
<<>>=
-0.79 + c(-1, 1)*qt(.975, df=587) * 2.08
@

\subsection{Multiple comparisons}

A number of packages support the comparison of multiple tests using R (see for example
the {\tt multcomp} package).  

\subsection{Power}

The {\tt power.anova.test()} function can be used to calculate power and sample size
for a one-way ANOVA.  For the power of a reading comprehension study (Example 12.27, pages 668-669), this yields power of approximately 35\%.
<<>>=
power.anova.test(groups=3, n=10, within.var=7^2, between.var=var(c(41, 47, 44)))
@


\end{document}