--- title: "SDM4 in R: Comparing Groups (Chapter 22)" author: "Nicholas Horton (nhorton@amherst.edu)" date: "January 2, 2017" output: html_document: fig_height: 5 fig_width: 7 pdf_document: fig_height: 3 fig_width: 5 word_document: fig_height: 4 fig_width: 6 --- ```{r, include=FALSE} # Don't delete this chunk if you are using the mosaic package # This loads the mosaic and dplyr packages require(mosaic) options(digits=3) ``` ```{r, include=FALSE} # Some customization. You can alter or delete as desired (if you know what you are doing). # This changes the default colors in lattice plots. trellis.par.set(theme=theme.mosaic()) # knitr settings to control how R chunks work. require(knitr) opts_chunk$set( tidy=FALSE, # display code as typed size="small" # slightly smaller font for code ) ``` ## Introduction and background This document is intended to help describe how to undertake analyses introduced as examples in the Fourth Edition of \emph{Stats: Data and Models} (2014) by De Veaux, Velleman, and Bock. More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series. This file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4. This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic). ## Chapter 22: Comparing Groups ### Section 22.1: The standard deviation of a difference We can replicate the calculations in the example on the bottom of page 587. ```{r} n1 <- 248; p1 <- 0.57 n2 <- 256; p2 <- 0.70 sediff <- sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2); sediff ``` ### Section 22.3: Confidence interval for a difference We can replicate the values from the example on page 590. ```{r} (p2 - p1) + c(-1.96, 1.96)*sediff ``` ### Section 22.4: Testing for a difference in proportions We can replicate the values from the example on pages 594-595. ```{r} n1 <- 293; y1 <- 205 n2 <- 469; y2 <- 235 ppooled <- (y1+y2)/(n1+n2); ppooled sepooled <- sqrt(ppooled*(1-ppooled)/n1 + ppooled*(1-ppooled)/n2); sepooled z <- (y1/n1 - y2/n2)/sepooled; z pval <- 2*pnorm(z, lower.tail = FALSE); pval ``` ### Section 22.6: Testing for a difference in means ```{r} n1 <- 8; n2 <- 7 ybar1 <- 281.88; ybar2 <- 211.43 s1 <- 18.31; s2 <- 46.43 sediff <- sqrt(s1^2/n1 + s2^2/n2); sediff t <- (ybar1 - ybar2)/sediff; t pval <- 2*pt(t, df=7.62); pval ``` ```{r} prices <- read.csv("http://nhorton.people.amherst.edu/sdm4/data/Camera_prices.csv") prices with(prices, t.test(Buying.from.a.Friend, Buying.from.a.Stranger)) ``` Let's turn this dataset in a lattice friendlier version. ```{r} ds <- with(prices, data.frame(price=c(Buying.from.a.Friend, Buying.from.a.Stranger), group=c(rep("Friend", nrow(prices)), rep("Stranger", nrow(prices))))) ds t.test(price ~ group, data=ds) # Unpooled t.test(price ~ group, var.equal=TRUE, data=ds) # Pooled bwplot(group ~ price, data=ds) ```