---
title: "SDM4 in R: Probability Models (Chapter 16)"
author: "Nicholas Horton (nhorton@amherst.edu)"
date: "August 12, 2017"
output: 
  pdf_document:
    fig_height: 2.8
    fig_width: 6
  html_document:
    fig_height: 3
    fig_width: 5
  word_document:
    fig_height: 4
    fig_width: 6
---


```{r, include=FALSE}
# Don't delete this chunk if you are using the mosaic package
# This loads the mosaic and dplyr packages
require(mosaic)
```

```{r, include=FALSE}
# Some customization.  You can alter or delete as desired (if you know what you are doing).

# This changes the default colors in lattice plots.
trellis.par.set(theme=theme.mosaic())  

# knitr settings to control how R chunks work.
require(knitr)
opts_chunk$set(
  tidy=FALSE,     # display code as typed
  size="small"    # slightly smaller font for code
)
```

## Introduction and background 

This document is intended to help describe how to undertake analyses introduced 
as examples in the Fourth Edition of \emph{Stats: Data and Models} (2014) by De Veaux, Velleman, and Bock.
More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series.  This
file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4.

This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic).
A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024.

## Chapter 16: Probability Models

### Section 16.1: Bernoulli models

We can replicate the calculation on page 413:

```{r}
library(mosaic); library(readr); options(digits=3)
dbinom(0, size=1, prob=0.2)  # Probability that you don't get Hope's picture
dbinom(1, size=1, prob=0.2)  # Probability that you do
dgeom(1, prob=0.2)  # Probability that you get it on the second try P(X=1)
# Note that the geometric in R doesn't include the final success!
```

### Section 16.2: The Geometric model

We can replicatethe example from page 414:

```{r}
p <- 0.09; p
plotDist("geom", p, xlim=c(-1, 40), xlab="x", ylab="P(X=x)", main="Geometric distribution")
1/p   # true expected value
mean(~ rgeom(20000, prob=p)) #  simulation to check
```

and the Think/Show/Tell on pages 415-416:

```{r}
p <- 0.06
dgeom(0, prob=p)
dgeom(1, prob=p)
dgeom(2, prob=p)
dgeom(3, prob=p)
pgeom(3, prob=p)
```

### Section 16.3: The Binomial model

Let's replicate the example on page 417:

```{r}
p <- 0.2; p    # back to Hope Solo example
n <- 5
plotDist("binom", params=c(size=n, prob=p), xlim=c(-0.2, 5.2), 
         xlab="x", ylab="P(X=x)", main="Binomial distribution")
n*p   # true expected value
n*p*(1-p)  # true variance
mysamp <- rbinom(20000, size=n, prob=p) #  simulation to check
mean(~ mysamp)
var(~ mysamp)
dbinom(2, size=5, prob=p)
```

or the SPAM example on page 418:
```
dbinom(1, size=25, prob=0.09)
dbinom(2, size=25, prob=0.09)
sum(dbinom(1:2, size=25, prob=0.09))
```

### Section 16.4: Normal approximation to the binomial

Let's replicate the example on page 419:

```{r}
p <- 0.06; n <- 32000
plotDist("binom", params=list(n, p), xlab="x", ylab="P(X=x)")
ex <- n*p; varx <- n*p*(1-p)
plotDist("norm", params=list(ex, sqrt(varx)), xlab="x", ylab="f(x)")
pbinom(1849, n, p)
pnorm(1849, ex, sqrt(varx))
```

Your parents had to use the normal approximation.  The good news is that you don't have to if you use R.

### Section 16.5: The continuity correction

Let's replicate the example on page 421 to demonstrate that we don't need the continuity correction if we calculate the exact distribution of the binomial.

```{r}
p <- 0.2; n <- 50
dbinom(10, n, p)
pnorm(10.5, n*p, sqrt(n*p*(1-p))) - pnorm(9.5, n*p, sqrt(n*p*(1-p)))  # continuity correction
```

### Section 16.6: The Poisson model

Let's replicate the example on page 424:

```{r}
lambda <- 35000*0.00011
plotDist("pois", lambda, xlab="x", ylab="P(X=x)")
1 - ppois(7, lambda)
```

### Section 16.7: Uniform and exponential

Let's replicate the example on page 426 for the uniform distribution:

```{r}
x <- runif(20000, min=0, max=20)
mean(~ x)
sd(~ x)
```

and on page 427 for the exponential distribution:
```{r}
plotDist("exp", 1, xlim=c(0, 5), xlab="x", ylab="f(x)")
pexp(1/3, rate=4)   # rate is 1/E[X]
```