---
title: "SDM4 in R: Comparing Groups (Chapter 22)"
author: "Nicholas Horton (nhorton@amherst.edu)"
date: "January 2, 2017"
output: 
  html_document:
    fig_height: 5
    fig_width: 7
  pdf_document:
    fig_height: 3
    fig_width: 5
  word_document:
    fig_height: 4
    fig_width: 6
---


```{r, include=FALSE}
# Don't delete this chunk if you are using the mosaic package
# This loads the mosaic and dplyr packages
require(mosaic)
options(digits=3)
```

```{r, include=FALSE}
# Some customization.  You can alter or delete as desired (if you know what you are doing).

# This changes the default colors in lattice plots.
trellis.par.set(theme=theme.mosaic())  

# knitr settings to control how R chunks work.
require(knitr)
opts_chunk$set(
  tidy=FALSE,     # display code as typed
  size="small"    # slightly smaller font for code
)
```

## Introduction and background 

This document is intended to help describe how to undertake analyses introduced 
as examples in the Fourth Edition of \emph{Stats: Data and Models} (2014) by De Veaux, Velleman, and Bock.
More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series.  This
file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4.

This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic).

## Chapter 22: Comparing Groups

### Section 22.1: The standard deviation of a difference

We can replicate the calculations in the example on the bottom of page 587.

```{r}
n1 <- 248; p1 <- 0.57
n2 <- 256; p2 <- 0.70
sediff <- sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2); sediff
```

### Section 22.3: Confidence interval for a difference

We can replicate the values from the example on page 590.
```{r}
(p2 - p1) + c(-1.96, 1.96)*sediff
```

### Section 22.4: Testing for a difference in proportions

We can replicate the values from the example on pages 594-595.

```{r}
n1 <- 293; y1 <- 205
n2 <- 469; y2 <- 235
ppooled <- (y1+y2)/(n1+n2); ppooled
sepooled <- sqrt(ppooled*(1-ppooled)/n1 + ppooled*(1-ppooled)/n2); sepooled
z <- (y1/n1 - y2/n2)/sepooled; z
pval <- 2*pnorm(z, lower.tail = FALSE); pval
```

### Section 22.6: Testing for a difference in means

```{r}
n1 <- 8; n2 <- 7
ybar1 <- 281.88; ybar2 <- 211.43
s1 <- 18.31; s2 <- 46.43
sediff <- sqrt(s1^2/n1 + s2^2/n2); sediff
t <- (ybar1 - ybar2)/sediff; t
pval <- 2*pt(t, df=7.62); pval
```

```{r}
prices <- read.csv("http://nhorton.people.amherst.edu/sdm4/data/Camera_prices.csv")
prices
with(prices, t.test(Buying.from.a.Friend, Buying.from.a.Stranger))
```

Let's turn this dataset in a lattice friendlier version.
```{r}
ds <- with(prices, 
  data.frame(price=c(Buying.from.a.Friend, Buying.from.a.Stranger),
             group=c(rep("Friend", nrow(prices)), rep("Stranger", nrow(prices)))))
ds
t.test(price ~ group, data=ds)   # Unpooled
t.test(price ~ group, var.equal=TRUE, data=ds)   # Pooled
bwplot(group ~ price, data=ds)
```