---
title: "SDM4 in R: Comparing Groups (Chapter 22)"
author: "Nicholas Horton (nhorton@amherst.edu) and Sarah McDonald"
date: "June 12, 2018"
output:
pdf_document:
fig_height: 2.8
fig_width: 7
html_document:
fig_height: 3
fig_width: 5
word_document:
fig_height: 4
fig_width: 6
---
```{r, include = FALSE}
# Don't delete this chunk if you are using the mosaic package
# This loads the mosaic and dplyr packages
require(mosaic)
options(digits = 3)
```
```{r, include = FALSE}
# knitr settings to control how R chunks work.
require(knitr)
opts_chunk$set(
tidy = FALSE, # display code as typed
size = "small" # slightly smaller font for code
)
```
## Introduction and background
This document is intended to help describe how to undertake analyses introduced
as examples in the Fourth Edition of *Stats: Data and Models* (2014) by De Veaux, Velleman, and Bock.
More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series. This
file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4.
This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic).
A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024.
## Chapter 22: Comparing Groups
### Section 22.1: The standard deviation of a difference
We can replicate the calculations in the example on the bottom of page 587.
```{r}
n1 <- 248
p1 <- 0.57
n2 <- 256
p2 <- 0.70
sediff <- sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2)
sediff
```
### Section 22.3: Confidence interval for a difference
We can replicate the values from the example on page 590.
```{r}
(p2 - p1) + c(-1.96, 1.96)*sediff
```
### Section 22.4: Testing for a difference in proportions
We can replicate the values from the example on pages 594-595.
```{r}
n1 <- 293
y1 <- 205
n2 <- 469
y2 <- 235
ppooled <- (y1+y2)/(n1+n2)
ppooled
sepooled <- sqrt(ppooled*(1-ppooled)/n1 + ppooled*(1-ppooled)/n2)
sepooled
z <- (y1/n1 - y2/n2)/sepooled
z
pval <- 2*pnorm(z, lower.tail = FALSE)
pval
```
### Section 22.6: Testing for a difference in means
```{r}
n1 <- 8
n2 <- 7
ybar1 <- 281.88
ybar2 <- 211.43
s1 <- 18.31
s2 <- 46.43
sediff <- sqrt(s1^2/n1 + s2^2/n2)
sediff
t <- (ybar1 - ybar2)/sediff
t
pval <- 2*pt(t, df = 7.62)
pval
```
```{r}
prices <- read.csv("http://nhorton.people.amherst.edu/sdm4/data/Camera_prices.csv")
prices
with(prices, t.test(Buying.from.a.Friend, Buying.from.a.Stranger))
```
Let's turn this dataset in a ggformula friendlier version.
```{r warning = FALSE}
ds <- with(prices,
data.frame(price = c(Buying.from.a.Friend, Buying.from.a.Stranger),
group = c(rep("Friend", nrow(prices)), rep("Stranger", nrow(prices)))))
ds
t.test(price ~ group, data = ds) # Unpooled or unequal variance
t.test(price ~ group, var.equal = TRUE, data = ds) # Pooled or equal variance
gf_boxplot(price ~ group, data = ds) %>%
gf_refine(coord_flip())
```