--- title: "SDM4 in R: Comparing Groups (Chapter 22)" author: "Nicholas Horton (nhorton@amherst.edu) and Sarah McDonald" date: "June 12, 2018" output: pdf_document: fig_height: 2.8 fig_width: 7 html_document: fig_height: 3 fig_width: 5 word_document: fig_height: 4 fig_width: 6 --- ```{r, include = FALSE} # Don't delete this chunk if you are using the mosaic package # This loads the mosaic and dplyr packages require(mosaic) options(digits = 3) ``` ```{r, include = FALSE} # knitr settings to control how R chunks work. require(knitr) opts_chunk$set( tidy = FALSE, # display code as typed size = "small" # slightly smaller font for code ) ``` ## Introduction and background This document is intended to help describe how to undertake analyses introduced as examples in the Fourth Edition of *Stats: Data and Models* (2014) by De Veaux, Velleman, and Bock. More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series. This file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4. This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic). A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024. ## Chapter 22: Comparing Groups ### Section 22.1: The standard deviation of a difference We can replicate the calculations in the example on the bottom of page 587. ```{r} n1 <- 248 p1 <- 0.57 n2 <- 256 p2 <- 0.70 sediff <- sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2) sediff ``` ### Section 22.3: Confidence interval for a difference We can replicate the values from the example on page 590. ```{r} (p2 - p1) + c(-1.96, 1.96)*sediff ``` ### Section 22.4: Testing for a difference in proportions We can replicate the values from the example on pages 594-595. ```{r} n1 <- 293 y1 <- 205 n2 <- 469 y2 <- 235 ppooled <- (y1+y2)/(n1+n2) ppooled sepooled <- sqrt(ppooled*(1-ppooled)/n1 + ppooled*(1-ppooled)/n2) sepooled z <- (y1/n1 - y2/n2)/sepooled z pval <- 2*pnorm(z, lower.tail = FALSE) pval ``` ### Section 22.6: Testing for a difference in means ```{r} n1 <- 8 n2 <- 7 ybar1 <- 281.88 ybar2 <- 211.43 s1 <- 18.31 s2 <- 46.43 sediff <- sqrt(s1^2/n1 + s2^2/n2) sediff t <- (ybar1 - ybar2)/sediff t pval <- 2*pt(t, df = 7.62) pval ``` ```{r} prices <- read.csv("http://nhorton.people.amherst.edu/sdm4/data/Camera_prices.csv") prices with(prices, t.test(Buying.from.a.Friend, Buying.from.a.Stranger)) ``` Let's turn this dataset in a ggformula friendlier version. ```{r warning = FALSE} ds <- with(prices, data.frame(price = c(Buying.from.a.Friend, Buying.from.a.Stranger), group = c(rep("Friend", nrow(prices)), rep("Stranger", nrow(prices))))) ds t.test(price ~ group, data = ds) # Unpooled or unequal variance t.test(price ~ group, var.equal = TRUE, data = ds) # Pooled or equal variance gf_boxplot(price ~ group, data = ds) %>% gf_refine(coord_flip()) ```