---
title: "SDM4 in R: Displaying and Describing Quantitative Data (Chapter 3)"
author: "Nicholas Horton (nhorton@amherst.edu)"
date: "June 13, 2018"
output: 
  pdf_document:
    fig_height: 4
    fig_width: 6
  html_document:
    fig_height: 3
    fig_width: 5
  word_document:
    fig_height: 4
    fig_width: 6
---


```{r, include = FALSE}
# Don't delete this chunk if you are using the mosaic package
# This loads the mosaic and dplyr packages
require(mosaic)
```

```{r, include = FALSE}
# knitr settings to control how R chunks work.
require(knitr)
opts_chunk$set(
  tidy = FALSE,     # display code as typed
  size = "small"    # slightly smaller font for code
)
```

## Introduction and background 

This document is intended to help describe how to undertake analyses introduced 
as examples in the Fourth Edition of *Stats: Data and Models* (2014) by De Veaux, Velleman, and Bock.
More information about the book can be found at http://wps.aw.com/aw_deveaux_stats_series.  This
file as well as the associated R Markdown reproducible analysis source file used to create it can be found at http://nhorton.people.amherst.edu/sdm4.

This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic).
A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024.

## Chapter 3: Displaying and describing quantitative data

### Section 3.1: Displaying quantitative variables

See Figure 3.1 on page 46.
```{r message = FALSE}
library(mosaic)
library(readr)
options(digits = 3)
Tsunami <- read_delim("http://nhorton.people.amherst.edu/sdm4/data/Tsunami_Earthquakes.txt", 
  delim = "\t")
nrow(Tsunami)   
gf_histogram(~ Magnitude, binwidth = 0.5, center = 0.5/2+0.001, 
  data = Tsunami)
gf_histogram(..density.. ~ Magnitude, binwidth = 0.5, center = 0.5/2+0.001, 
  data = Tsunami)
```

Note that Figure 3.3 on page 47 displays a histogram with the y-axis measured by percent in each bar.  The first histogram displays the count and the last the density (where the total area of the bars adds up to 1).

```{r message = FALSE}
Pulse_rates <- read_delim("http://nhorton.people.amherst.edu/sdm4/data/Pulse_rates.txt",
  delim = "\t")
with(Pulse_rates, stem(Pulse))
gf_dotplot(~ Pulse, data = Pulse_rates)
```

Or on page 49

```{r}
with(Pulse_rates, stem(Pulse, scale = 2))
```

### Section 3.2: Shape


### Section 3.3: Center

See calculation and Figure 3.11 on page 53.

```{r}
recent <- filter(Tsunami, Year >= 1989, Year <= 2013)
nrow(recent)   
median(~ Magnitude, data = recent)
gf_histogram(~ Magnitude, binwidth = 0.2, data = recent)
```

### Section 3.4: Spread

See statistics reported on pages 54-55.

```{r}
favstats(~ Magnitude, data = recent)
range(~ Magnitude, data = recent)
diff(range(~ Magnitude, data = recent))
IQR(~ Magnitude, data = recent)
```

### Section 3.5: Boxplots and 5-Number Summaries

See display on page 57.
```{r}
gf_boxplot(Magnitude ~ 1, data = recent)
```

Note that boxplots of a single distribution aren't usually very interesting (more useful displays will be seen in Chapter 4 when we start comparing groups).

### Section 3.6: The Center of Symmetric Distributions: The Mean

See calculation on page 59.

```{r}
mean(~ Magnitude, data = recent)
median(~ Magnitude, data = recent)
```

### Section 3.7: The Spread of Symmetric Distributions: The Standard Deviation


```{r}
sd(~ Magnitude, data = recent)
var(~ Magnitude, data = recent)
sqrt(var(~ Magnitude, data = recent))
0.702^2
```

The standard deviation squared equals the variance.