---
title: "SDM4 in R: Probability Models (Chapter 16)"
author: "Nicholas Horton (nhorton@amherst.edu)"
date: "June 13, 2018"
output: 
  pdf_document:
    fig_height: 2.8
    fig_width: 6
  html_document:
    fig_height: 3
    fig_width: 5
  word_document:
    fig_height: 4
    fig_width: 6
---


```{r, include=FALSE}
# Don't delete this chunk if you are using the mosaic package
# This loads the mosaic and dplyr packages
require(mosaic)
```

```{r, include=FALSE}
# knitr settings to control how R chunks work.
require(knitr)
opts_chunk$set(
  tidy=FALSE,     # display code as typed
  size="small"    # slightly smaller font for code
)
```

## Introduction and background 

This document is intended to help describe how to undertake analyses introduced 
as examples in the Fourth Edition of *Stats: Data and Models* (2014) by De Veaux, Velleman, and Bock.
More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series.  This
file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4.

This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic).
A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024.

## Chapter 16: Probability Models

### Section 16.1: Bernoulli models

We can replicate the calculation on page 413:

```{r}
library(mosaic)
options(digits = 3)
dbinom(0, size = 1, prob = 0.2)  # Probability that you don't get Hope's picture
dbinom(1, size = 1, prob = 0.2)  # Probability that you do
dgeom(1, prob = 0.2)  # Probability that you get it on the second try P(X=1)
# Note that the geometric in R doesn't include the final success!
```

### Section 16.2: The Geometric model

We can replicate the example from page 414:

```{r}
p <- 0.09
gf_dist("geom", prob = p, 
  main = "Geometric distribution", ylab = "P(X=x)")
1/p   # true expected value
mean(~ rgeom(20000, prob = p)) #  simulation to check
```

along with the Think/Show/Tell on pages 415-416:

```{r}
p <- 0.06
dgeom(0:3, prob = p)
pgeom(3, prob = p)
```

### Section 16.3: The Binomial model

Let's replicate the example on page 417:

```{r}
p <- 0.2    # back to Hope Solo example
n <- 5
gf_dist("binom", size = n, prob = p, xlab = "x", ylab = "P(X=x)", 
  title = "Binomial distribution")
n*p   # true expected value
n*p*(1-p)  # true variance
mysamp <- rbinom(20000, size = n, prob = p) #  simulation to check
mean(~ mysamp)
var(~ mysamp)
dbinom(2, size = 5, prob = p)
```

or the SPAM example on page 418:
```{r}
dbinom(1, size = 25, prob=0.09)
dbinom(2, size = 25, prob = 0.09)
sum(dbinom(1:2, size = 25, prob = 0.09))
```

### Section 16.4: Normal approximation to the binomial

Let's replicate the example on page 419:

```{r}
p <- 0.06; n <- 32000
gf_dist("binom", size = n, prob = p, xlab = "x", ylab = "P(X=x)")
ex <- n*p
varx <- n*p*(1-p)
gf_dist("norm", mean = ex, sd = sqrt(varx), xlab = "x", ylab = "P(X=x)")
pbinom(1849, n, p)
pnorm(1849, ex, sqrt(varx))
```

Your parents had to use the normal approximation. :-) The good news is that you don't have to if you use R (or other modern software).

### Section 16.5: The continuity correction

Let's repeat the example on page 421 to demonstrate that we don't need the continuity correction if we calculate the exact distribution of the binomial.

```{r}
p <- 0.2
n <- 50
dbinom(10, n, p)
pnorm(10.5, n * p, sqrt(n * p * (1 - p))) - 
  pnorm(9.5, n * p, sqrt(n * p *(1 - p)))  # continuity correction
```

### Section 16.6: The Poisson model

Let's replicate the example on page 424:

```{r}
lambda <- 35000 * 0.00011
gf_dist("pois", lambda = lambda, xlab = "x", ylab = "P(X=x)")
1 - ppois(7, lambda)
```

### Section 16.7: Uniform and exponential

Let's replicate the example on page 426 for the uniform distribution:

```{r}
x <- runif(20000, min = 0, max = 20)
mean(~ x)
sd(~ x)
```

and on page 427 for the exponential distribution:
```{r}
gf_dist("exp", rate = 1, xlab = "x", ylab = "f(x)")
pexp(1/3, rate = 4)   # rate is 1/E[X]
```