---
title: "Comparing Groups (Chapter 20)"
author: "Patrick Frenett, Vickie Ip, and Nicholas Horton (nhorton@amherst.edu)"
date: "June 22, 2018"
output:
pdf_document:
fig_height: 4
fig_width: 6
html_document:
fig_height: 3
fig_width: 5
word_document:
fig_height: 4
fig_width: 6
---
```{r, include = FALSE}
# Don't delete this chunk if you are using the mosaic package
# This loads the mosaic and dplyr packages
require(mosaic)
```
```{r, include = FALSE}
# knitr settings to control how R chunks work.
require(knitr)
opts_chunk$set(
tidy = FALSE, # display code as typed
size = "small", # slightly smaller font for code
fig.align = "center"
)
```
## Introduction and background
This document is intended to help describe how to undertake analyses introduced
as examples in the Fourth Edition of *Intro Stats* (2013) by De Veaux, Velleman, and Bock.
More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series. This
file as well as the associated R Markdown reproducible analysis source file used to create it can be found at https://nhorton.people.amherst.edu/is4.
This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic). A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024.
Note that some of the figures in this document may differ slightly from those in the IS4 book due to small differences in datasets. However in all cases the analysis and techniques in R are accurate.
## Chapter 20: Comparing Groups
### Section 20.1: The standard deviation of a difference
We can replicate the calculations in the example on the bottom of page 543.
```{r}
n1 <- 248
p1 <- 0.57
n2 <- 256
p2 <- 0.70
sediff <- sqrt(p1*(1 - p1)/n1 + p2*(1 - p2)/n2)
sediff
```
### Section 20.3: Confidence interval for a difference
We can replicate the values from the example on page 546.
```{r}
(p2 - p1) + c(-1.96, 1.96)*sediff
```
### Section 20.4: Testing for a difference in proportions
We can replicate the values from the example on pages 550-551.
```{r}
n1 <- 293
y1 <- 205
n2 <- 469
y2 <- 235
ppooled <- (y1 + y2)/(n1 + n2)
ppooled
sepooled <- sqrt(ppooled*(1 - ppooled)/n1 + ppooled*(1 - ppooled)/n2)
sepooled
z <- (y1/n1 - y2/n2)/sepooled
z
pval <- 2*pnorm(z, lower.tail = FALSE)
pval
```
### Section 20.6: Testing for a difference in means
```{r}
n1 <- 8
n2 <- 7
ybar1 <- 281.88
ybar2 <- 211.43
s1 <- 18.31
s2 <- 46.43
sediff <- sqrt(s1^2/n1 + s2^2/n2)
sediff
t <- (ybar1 - ybar2)/sediff
t
pval <- 2*pt(t, df = 7.62)
pval
```
```{r message = FALSE}
prices <- read.csv("https://nhorton.people.amherst.edu/sdm4/data/Camera_prices.csv")
prices
with(prices, t.test(Buying.from.a.Friend, Buying.from.a.Stranger))
```
```{r warning = FALSE}
ds <- with(prices,
data.frame(price = c(Buying.from.a.Friend, Buying.from.a.Stranger),
group = c(rep("Friend", nrow(prices)), rep("Stranger", nrow(prices)))))
ds
t.test(price ~ group, data = ds) # Unpooled
t.test(price ~ group, var.equal = TRUE, data = ds) # Pooled
gf_boxplot(price ~ group, data = ds) %>%
gf_labs(x = "Group", y = "Price")
```