--- title: "SDM4 in R: Probability Models (Chapter 16)" author: "Nicholas Horton (nhorton@amherst.edu)" date: "June 13, 2018" output: pdf_document: fig_height: 2.8 fig_width: 6 html_document: fig_height: 3 fig_width: 5 word_document: fig_height: 4 fig_width: 6 --- ```{r, include=FALSE} # Don't delete this chunk if you are using the mosaic package # This loads the mosaic and dplyr packages require(mosaic) ``` ```{r, include=FALSE} # knitr settings to control how R chunks work. require(knitr) opts_chunk$set( tidy=FALSE, # display code as typed size="small" # slightly smaller font for code ) ``` ## Introduction and background This document is intended to help describe how to undertake analyses introduced as examples in the Fourth Edition of *Stats: Data and Models* (2014) by De Veaux, Velleman, and Bock. More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series. This file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4. This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic). A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024. ## Chapter 16: Probability Models ### Section 16.1: Bernoulli models We can replicate the calculation on page 413: ```{r} library(mosaic) options(digits = 3) dbinom(0, size = 1, prob = 0.2) # Probability that you don't get Hope's picture dbinom(1, size = 1, prob = 0.2) # Probability that you do dgeom(1, prob = 0.2) # Probability that you get it on the second try P(X=1) # Note that the geometric in R doesn't include the final success! ``` ### Section 16.2: The Geometric model We can replicate the example from page 414: ```{r} p <- 0.09 gf_dist("geom", prob = p, main = "Geometric distribution", ylab = "P(X=x)") 1/p # true expected value mean(~ rgeom(20000, prob = p)) # simulation to check ``` along with the Think/Show/Tell on pages 415-416: ```{r} p <- 0.06 dgeom(0:3, prob = p) pgeom(3, prob = p) ``` ### Section 16.3: The Binomial model Let's replicate the example on page 417: ```{r} p <- 0.2 # back to Hope Solo example n <- 5 gf_dist("binom", size = n, prob = p, xlab = "x", ylab = "P(X=x)", title = "Binomial distribution") n*p # true expected value n*p*(1-p) # true variance mysamp <- rbinom(20000, size = n, prob = p) # simulation to check mean(~ mysamp) var(~ mysamp) dbinom(2, size = 5, prob = p) ``` or the SPAM example on page 418: ```{r} dbinom(1, size = 25, prob=0.09) dbinom(2, size = 25, prob = 0.09) sum(dbinom(1:2, size = 25, prob = 0.09)) ``` ### Section 16.4: Normal approximation to the binomial Let's replicate the example on page 419: ```{r} p <- 0.06; n <- 32000 gf_dist("binom", size = n, prob = p, xlab = "x", ylab = "P(X=x)") ex <- n*p varx <- n*p*(1-p) gf_dist("norm", mean = ex, sd = sqrt(varx), xlab = "x", ylab = "P(X=x)") pbinom(1849, n, p) pnorm(1849, ex, sqrt(varx)) ``` Your parents had to use the normal approximation. :-) The good news is that you don't have to if you use R (or other modern software). ### Section 16.5: The continuity correction Let's repeat the example on page 421 to demonstrate that we don't need the continuity correction if we calculate the exact distribution of the binomial. ```{r} p <- 0.2 n <- 50 dbinom(10, n, p) pnorm(10.5, n * p, sqrt(n * p * (1 - p))) - pnorm(9.5, n * p, sqrt(n * p *(1 - p))) # continuity correction ``` ### Section 16.6: The Poisson model Let's replicate the example on page 424: ```{r} lambda <- 35000 * 0.00011 gf_dist("pois", lambda = lambda, xlab = "x", ylab = "P(X=x)") 1 - ppois(7, lambda) ``` ### Section 16.7: Uniform and exponential Let's replicate the example on page 426 for the uniform distribution: ```{r} x <- runif(20000, min = 0, max = 20) mean(~ x) sd(~ x) ``` and on page 427 for the exponential distribution: ```{r} gf_dist("exp", rate = 1, xlab = "x", ylab = "f(x)") pexp(1/3, rate = 4) # rate is 1/E[X] ```