---
title: "Confidence intervals for proportions (Chapter 16)"
author: "Patrick Frenett, Vickie Ip, and Nicholas Horton (nhorton@amherst.edu)"
date: "June 19, 2018"
output: 
  pdf_document:
    fig_height: 4
    fig_width: 6
  html_document:
    fig_height: 3
    fig_width: 5
  word_document:
    fig_height: 4
    fig_width: 6
---
  
  
```{r, include = FALSE}
# Don't delete this chunk if you are using the mosaic package
# This loads the mosaic and dplyr packages
require(mosaic)
```

```{r, include = FALSE}
# knitr settings to control how R chunks work.
require(knitr)
opts_chunk$set(
  tidy = FALSE,     # display code as typed
  size = "small",    # slightly smaller font for code
  fig.align = "center"
)
```

## Introduction and background 

This document is intended to help describe how to undertake analyses introduced 
as examples in the Fourth Edition of *Intro Stats* (2013) by De Veaux, Velleman, and Bock.
More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series.  This
file as well as the associated R Markdown reproducible analysis source file used to create it can be found at https://nhorton.people.amherst.edu/is4.

This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic). A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024.

Note that some of the figures in this document may differ slightly from those in the IS4 book due to small differences in datasets. However in all cases the analysis and techniques in R are accurate.

## Chapter 16: Confidence intervals for proportions

### Section 16.1: A confidence interval

The Facebook survey of 156 respondents yielded 48 who updated their status every day (or more often).

```{r}
n <- 156
binom.test(48, n)    # exact binomial
```

Calculation on page 429
```{r}
phat <- 48/n
phat
sep <- sqrt(phat*(1 - phat)/n)
sep    # matches value on page 429
interval <- phat + c(-2, 2)*sep
interval
```

### Section 16.2: Interpreting confidence intervals

```{r warning = FALSE}
set.seed(1988)
CIsim(n = 100, samples = 20)
```

We would expect 19 out of 20 of the intervals to cover the true (population) value, but here only 18 out of 20 actually covered that value (see Figure at top of page 432).

### Section 16.3: Margin of error

We can replicate the calculation for the "For Example: Finding the margin of error Take 1" (page 434)

```{r}
sdp <- sqrt(.5*.5/1010)
sdp   # worst case margin of error (based on p=0.5)
me <- 2*sdp
me
```

We can replicate the calculation for the "For Example: Finding the margin of error Take 1" (pages 434-435)

```{r}
qnorm(.95, mean = 0, sd = 1)   # z-star for 90% confidence interval
sep <- sqrt(.4*.6/1010)
sep
me <- 1.6445*sep
me
```

### Section 16.4: Assumptions and Condition

We can replicate the calculation for the "For Example: choosing a sample size" (page 438)

```{r}
zstar <- qnorm(.975, mean = 0, sd = 1)
zstar
me <- 0.02 # desired margin of error
p <- 0.40
n <- (zstar*sqrt(p*(1 - p))/me)^2
n
```

We will need about 2305 subjects to yield a margin of error of 2% under these assumptions.