--- title: "IS4 in R: Displaying and Summarizing Quantitative Data (Chapter 3)" author: "Patrick Frenett, Vickie Ip, and Nicholas Horton (nhorton@amherst.edu)" date: "June 20, 2018" output: pdf_document: fig_height: 2.8 fig_width: 5 html_document: fig_height: 3 fig_width: 5 word_document: fig_height: 4 fig_width: 6 --- ```{r, include = FALSE} # Don't delete this chunk if you are using the mosaic package # This loads the mosaic and dplyr packages require(mosaic) ``` ```{r, include = FALSE} # knitr settings to control how R chunks work. require(knitr) opts_chunk$set( tidy = FALSE, # display code as typed size = "small", # slightly smaller font for code fig.align = "center" ) ``` ## Introduction and Background This document is intended to help describe how to undertake analyses introduced as examples in the Fourth Edition of *Intro Stats* (2013) by De Veaux, Velleman, and Bock. More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series. This file as well as the associated R Markdown reproducible analysis source file used to create it can be found at https://nhorton.people.amherst.edu/is4. This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic). A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024. Note that some of the figures in this document may differ slightly from those in the IS4 book due to small differences in datasets. However in all cases the analysis and techniques in R are accurate. ## Chapter 3: Displaying and Summarizing Quantitative Data ### Section 3.1: Displaying Quantitative Variables See Figure 3.1 on page 44. ```{r message = FALSE} library(mosaic) library(readr) library(ggformula) options(digits = 3) Tsunami <- read_delim("https://nhorton.people.amherst.edu/sdm4/data/Tsunami_Earthquakes.txt", delim = "\t") ``` ```{r} nrow(Tsunami) ``` ```{r} gf_histogram(~ Magnitude, binwidth = 0.5, center = 0.5/2, data = Tsunami, fill = "orange", col = TRUE) %>% gf_labs(y = "Count") gf_histogram((..count../sum(..count..)*100) ~ Magnitude, center = 0.5/2, binwidth = 0.5, fill = "red", col = TRUE, data = Tsunami) %>% gf_labs(y = "Percentage of Total") gf_histogram(..density.. ~ Magnitude, binwidth = 0.5, center = 0.5/2 + 0.001, data = Tsunami, fill = "lightblue3", col = TRUE) %>% gf_labs(y = "Density") ``` Note that Figure 3.3 on page 45 displays a histogram (with the y-axis measured by percent in each bar. The first histogram displays the count and the last the density (where the total area of the bars adds up to 1). ```{r message=FALSE} Pulse_rates <- read_delim("http://nhorton.people.amherst.edu/sdm4/data/Pulse_rates.txt", delim = "\t") ``` ```{r message = FALSE} with(Pulse_rates, stem(Pulse)) gf_dotplot(~ Pulse, data = Pulse_rates, fill = "red") ``` Or on page 46. ```{r} with(Pulse_rates, stem(Pulse, scale = 2)) ``` ### Section 3.2: Shape ### Section 3.3: Center See calculation and Figure 3.11 on page 51. ```{r} recent <- filter(Tsunami, Year >= 1989, Year <= 2013) nrow(recent) median(~ Magnitude, data = recent) gf_histogram(..density.. ~ Magnitude, binwidth = 0.2, data = recent, fill = "cornflowerblue", col = TRUE) %>% gf_labs(y = "Density") ``` ### Section 3.4: Spread See statistics reported on pages 54-55. ```{r} df_stats(~ Magnitude, data = recent) range(~ Magnitude, data = recent) diff(range(~ Magnitude, data = recent)) IQR(~ Magnitude, data = recent) ``` ### Section 3.5: Boxplots and 5-Number Summaries See display on page 55. ```{r} gf_boxplot(Magnitude ~ 1, data = recent) %>% gf_theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) %>% gf_labs(x = "") ``` Note that boxplots of a single distribution aren't usually very interesting (more useful displays will be seen in Chapter 4 when we start comparing groups). ### Section 3.6: The Center of Symmetric Distributions: The Mean See calculation on page 57. ```{r} mean(~ Magnitude, data = recent) median(~ Magnitude, data = recent) ``` ### Section 3.7: The Spread of Symmetric Distributions: The Standard Deviation To check the claim made on page 60. ```{r} sd(~ Magnitude, data = recent) var(~ Magnitude, data = recent) sqrt(var(~ Magnitude, data = recent)) 0.702^2 ``` The standard deviation squared equals the variance.