---
title: "SDM4 in R: Probability Models (Chapter 16)"
author: "Nicholas Horton (nhorton@amherst.edu)"
date: "June 13, 2018"
output:
pdf_document:
fig_height: 2.8
fig_width: 6
html_document:
fig_height: 3
fig_width: 5
word_document:
fig_height: 4
fig_width: 6
---
```{r, include=FALSE}
# Don't delete this chunk if you are using the mosaic package
# This loads the mosaic and dplyr packages
require(mosaic)
```
```{r, include=FALSE}
# knitr settings to control how R chunks work.
require(knitr)
opts_chunk$set(
tidy=FALSE, # display code as typed
size="small" # slightly smaller font for code
)
```
## Introduction and background
This document is intended to help describe how to undertake analyses introduced
as examples in the Fourth Edition of *Stats: Data and Models* (2014) by De Veaux, Velleman, and Bock.
More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series. This
file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4.
This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic).
A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024.
## Chapter 16: Probability Models
### Section 16.1: Bernoulli models
We can replicate the calculation on page 413:
```{r}
library(mosaic)
options(digits = 3)
dbinom(0, size = 1, prob = 0.2) # Probability that you don't get Hope's picture
dbinom(1, size = 1, prob = 0.2) # Probability that you do
dgeom(1, prob = 0.2) # Probability that you get it on the second try P(X=1)
# Note that the geometric in R doesn't include the final success!
```
### Section 16.2: The Geometric model
We can replicate the example from page 414:
```{r}
p <- 0.09
gf_dist("geom", prob = p,
main = "Geometric distribution", ylab = "P(X=x)")
1/p # true expected value
mean(~ rgeom(20000, prob = p)) # simulation to check
```
along with the Think/Show/Tell on pages 415-416:
```{r}
p <- 0.06
dgeom(0:3, prob = p)
pgeom(3, prob = p)
```
### Section 16.3: The Binomial model
Let's replicate the example on page 417:
```{r}
p <- 0.2 # back to Hope Solo example
n <- 5
gf_dist("binom", size = n, prob = p, xlab = "x", ylab = "P(X=x)",
title = "Binomial distribution")
n*p # true expected value
n*p*(1-p) # true variance
mysamp <- rbinom(20000, size = n, prob = p) # simulation to check
mean(~ mysamp)
var(~ mysamp)
dbinom(2, size = 5, prob = p)
```
or the SPAM example on page 418:
```{r}
dbinom(1, size = 25, prob=0.09)
dbinom(2, size = 25, prob = 0.09)
sum(dbinom(1:2, size = 25, prob = 0.09))
```
### Section 16.4: Normal approximation to the binomial
Let's replicate the example on page 419:
```{r}
p <- 0.06; n <- 32000
gf_dist("binom", size = n, prob = p, xlab = "x", ylab = "P(X=x)")
ex <- n*p
varx <- n*p*(1-p)
gf_dist("norm", mean = ex, sd = sqrt(varx), xlab = "x", ylab = "P(X=x)")
pbinom(1849, n, p)
pnorm(1849, ex, sqrt(varx))
```
Your parents had to use the normal approximation. :-) The good news is that you don't have to if you use R (or other modern software).
### Section 16.5: The continuity correction
Let's repeat the example on page 421 to demonstrate that we don't need the continuity correction if we calculate the exact distribution of the binomial.
```{r}
p <- 0.2
n <- 50
dbinom(10, n, p)
pnorm(10.5, n * p, sqrt(n * p * (1 - p))) -
pnorm(9.5, n * p, sqrt(n * p *(1 - p))) # continuity correction
```
### Section 16.6: The Poisson model
Let's replicate the example on page 424:
```{r}
lambda <- 35000 * 0.00011
gf_dist("pois", lambda = lambda, xlab = "x", ylab = "P(X=x)")
1 - ppois(7, lambda)
```
### Section 16.7: Uniform and exponential
Let's replicate the example on page 426 for the uniform distribution:
```{r}
x <- runif(20000, min = 0, max = 20)
mean(~ x)
sd(~ x)
```
and on page 427 for the exponential distribution:
```{r}
gf_dist("exp", rate = 1, xlab = "x", ylab = "f(x)")
pexp(1/3, rate = 4) # rate is 1/E[X]
```