--- title: "SDM4 in R: Probability Models (Chapter 16)" author: "Nicholas Horton (nhorton@amherst.edu)" date: "August 12, 2017" output: pdf_document: fig_height: 2.8 fig_width: 6 html_document: fig_height: 3 fig_width: 5 word_document: fig_height: 4 fig_width: 6 --- ```{r, include=FALSE} # Don't delete this chunk if you are using the mosaic package # This loads the mosaic and dplyr packages require(mosaic) ``` ```{r, include=FALSE} # Some customization. You can alter or delete as desired (if you know what you are doing). # This changes the default colors in lattice plots. trellis.par.set(theme=theme.mosaic()) # knitr settings to control how R chunks work. require(knitr) opts_chunk$set( tidy=FALSE, # display code as typed size="small" # slightly smaller font for code ) ``` ## Introduction and background This document is intended to help describe how to undertake analyses introduced as examples in the Fourth Edition of \emph{Stats: Data and Models} (2014) by De Veaux, Velleman, and Bock. More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series. This file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4. This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic). A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024. ## Chapter 16: Probability Models ### Section 16.1: Bernoulli models We can replicate the calculation on page 413: ```{r} library(mosaic); library(readr); options(digits=3) dbinom(0, size=1, prob=0.2) # Probability that you don't get Hope's picture dbinom(1, size=1, prob=0.2) # Probability that you do dgeom(1, prob=0.2) # Probability that you get it on the second try P(X=1) # Note that the geometric in R doesn't include the final success! ``` ### Section 16.2: The Geometric model We can replicatethe example from page 414: ```{r} p <- 0.09; p plotDist("geom", p, xlim=c(-1, 40), xlab="x", ylab="P(X=x)", main="Geometric distribution") 1/p # true expected value mean(~ rgeom(20000, prob=p)) # simulation to check ``` and the Think/Show/Tell on pages 415-416: ```{r} p <- 0.06 dgeom(0, prob=p) dgeom(1, prob=p) dgeom(2, prob=p) dgeom(3, prob=p) pgeom(3, prob=p) ``` ### Section 16.3: The Binomial model Let's replicate the example on page 417: ```{r} p <- 0.2; p # back to Hope Solo example n <- 5 plotDist("binom", params=c(size=n, prob=p), xlim=c(-0.2, 5.2), xlab="x", ylab="P(X=x)", main="Binomial distribution") n*p # true expected value n*p*(1-p) # true variance mysamp <- rbinom(20000, size=n, prob=p) # simulation to check mean(~ mysamp) var(~ mysamp) dbinom(2, size=5, prob=p) ``` or the SPAM example on page 418: ``` dbinom(1, size=25, prob=0.09) dbinom(2, size=25, prob=0.09) sum(dbinom(1:2, size=25, prob=0.09)) ``` ### Section 16.4: Normal approximation to the binomial Let's replicate the example on page 419: ```{r} p <- 0.06; n <- 32000 plotDist("binom", params=list(n, p), xlab="x", ylab="P(X=x)") ex <- n*p; varx <- n*p*(1-p) plotDist("norm", params=list(ex, sqrt(varx)), xlab="x", ylab="f(x)") pbinom(1849, n, p) pnorm(1849, ex, sqrt(varx)) ``` Your parents had to use the normal approximation. The good news is that you don't have to if you use R. ### Section 16.5: The continuity correction Let's replicate the example on page 421 to demonstrate that we don't need the continuity correction if we calculate the exact distribution of the binomial. ```{r} p <- 0.2; n <- 50 dbinom(10, n, p) pnorm(10.5, n*p, sqrt(n*p*(1-p))) - pnorm(9.5, n*p, sqrt(n*p*(1-p))) # continuity correction ``` ### Section 16.6: The Poisson model Let's replicate the example on page 424: ```{r} lambda <- 35000*0.00011 plotDist("pois", lambda, xlab="x", ylab="P(X=x)") 1 - ppois(7, lambda) ``` ### Section 16.7: Uniform and exponential Let's replicate the example on page 426 for the uniform distribution: ```{r} x <- runif(20000, min=0, max=20) mean(~ x) sd(~ x) ``` and on page 427 for the exponential distribution: ```{r} plotDist("exp", 1, xlim=c(0, 5), xlab="x", ylab="f(x)") pexp(1/3, rate=4) # rate is 1/E[X] ```