---
title: "SDM4 in R: Comparing Groups (Chapter 22)"
author: "Nicholas Horton (nhorton@amherst.edu) and Sarah McDonald"
date: "June 12, 2018"
output: 
  pdf_document:
    fig_height: 2.8
    fig_width: 7
  html_document:
    fig_height: 3
    fig_width: 5
  word_document:
    fig_height: 4
    fig_width: 6
---


```{r, include = FALSE}
# Don't delete this chunk if you are using the mosaic package
# This loads the mosaic and dplyr packages
require(mosaic)
options(digits = 3)
```

```{r, include = FALSE}
# knitr settings to control how R chunks work.
require(knitr)
opts_chunk$set(
  tidy = FALSE,     # display code as typed
  size = "small"    # slightly smaller font for code
)
```

## Introduction and background 

This document is intended to help describe how to undertake analyses introduced 
as examples in the Fourth Edition of *Stats: Data and Models* (2014) by De Veaux, Velleman, and Bock.
More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series.  This
file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4.

This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic).
A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024.

## Chapter 22: Comparing Groups

### Section 22.1: The standard deviation of a difference

We can replicate the calculations in the example on the bottom of page 587.

```{r}
n1 <- 248 
p1 <- 0.57
n2 <- 256
p2 <- 0.70
sediff <- sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2)
sediff
```

### Section 22.3: Confidence interval for a difference

We can replicate the values from the example on page 590.
```{r}
(p2 - p1) + c(-1.96, 1.96)*sediff
```

### Section 22.4: Testing for a difference in proportions

We can replicate the values from the example on pages 594-595.

```{r}
n1 <- 293
y1 <- 205
n2 <- 469
y2 <- 235
ppooled <- (y1+y2)/(n1+n2)
ppooled
sepooled <- sqrt(ppooled*(1-ppooled)/n1 + ppooled*(1-ppooled)/n2)
sepooled
z <- (y1/n1 - y2/n2)/sepooled
z
pval <- 2*pnorm(z, lower.tail = FALSE)
pval
```

### Section 22.6: Testing for a difference in means

```{r}
n1 <- 8
n2 <- 7
ybar1 <- 281.88
ybar2 <- 211.43
s1 <- 18.31
s2 <- 46.43
sediff <- sqrt(s1^2/n1 + s2^2/n2)
sediff
t <- (ybar1 - ybar2)/sediff
t
pval <- 2*pt(t, df = 7.62)
pval
```

```{r}
prices <- read.csv("http://nhorton.people.amherst.edu/sdm4/data/Camera_prices.csv")
prices
with(prices, t.test(Buying.from.a.Friend, Buying.from.a.Stranger))
```

Let's turn this dataset in a ggformula friendlier version.
```{r warning = FALSE}
ds <- with(prices, 
  data.frame(price = c(Buying.from.a.Friend, Buying.from.a.Stranger),
             group = c(rep("Friend", nrow(prices)), rep("Stranger", nrow(prices)))))
ds
t.test(price ~ group, data = ds)   # Unpooled or unequal variance
t.test(price ~ group, var.equal = TRUE, data = ds)   # Pooled or equal variance
gf_boxplot(price ~ group, data = ds) %>%
  gf_refine(coord_flip())
```