---
title: "IPS9 in R: Producing data (Chapter 3)"
author: "Shukry Zablah (szablah20@amherst.edu) and Nicholas Horton (nhorton@amherst.edu)"
date: "July 25, 2018"
output:
pdf_document:
fig_height: 3
fig_width: 5
html_document:
fig_height: 3
fig_width: 5
word_document:
fig_height: 4
fig_width: 6
---
```{r, include = FALSE}
# Don't delete this chunk if you are using the mosaic package
# This loads the mosaic and dplyr packages
require(mosaic)
```
```{r, include = FALSE}
# knitr settings to control how R chunks work.
knitr::opts_chunk$set(
tidy = FALSE, # display code as typed
size = "small" # slightly smaller font for code
)
```
## Introduction and background
These documents are intended to help describe how to undertake analyses introduced
as examples in the Ninth Edition of \emph{Introduction to the Practice of Statistics} (2017) by Moore, McCabe, and Craig.
More information about the book can be found [here](https://macmillanlearning.com/Catalog/product/introductiontothepracticeofstatistics-ninthedition-moore).
The data used in these documents can be found under Data Sets in the [Student Site](https://www.macmillanlearning.com/catalog/studentresources/ips9e?_ga=2.29224888.526668012.1531487989-1209447309.1529940008#). This
file as well as the associated R Markdown reproducible analysis source file used to create it can be found at https://nhorton.people.amherst.edu/ips9/.
This work leverages initiatives undertaken by Project MOSAIC (http://www.mosaic-web.org), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the `mosaic` package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignettes (http://cran.r-project.org/web/packages/mosaic). A paper describing the mosaic approach was published in the *R Journal*: https://journal.r-project.org/archive/2017/RJ-2017-024.
## Chapter 3: Producing data
This file replicates the analyses from Chapter 3: Producing data.
First, load the packages that will be needed for this document:
```{r load-packages}
library(mosaic)
library(readr)
```
### Section 3.1: Sources of Data
### Section 3.2: Design of experiment
See Example 3.12 on page 178.
We are looking to setup a dataset with randomly assigned a "Control" or "Treatment" group to our dataset.
```{r ex12_1}
#Fig3.4(b)
set.seed(1) #guarantees the same random results each time R runs
Randomized <- data.frame(ID = c(1:10), randNum = runif(10))
Randomized
```
We created a data frame in which the first column corresponds to unique IDs and the second column holds the random numbers that will determine which group will be assigned to which ID.
Our plan is to order the IDs based on their random number and assign half of the observations to the Control group and the other half to the Treatment group. We start by arranging the observations.
```{r ex12_2}
#Fig3.4(c)
Randomized <- Randomized %>%
arrange(randNum)
Randomized
```
And finally we assign each observation to a group.
```{r ex12_3}
Randomized <- Randomized %>%
mutate(Group = c(rep("Treatment", 5), rep("Control", 5)))
Randomized
```
Now your observations are randomly assigned to a group! Here we chose to assign the first five ID to the Treatment group and the rest to the Control group, but there are other other ways to do it. To finish you can clean up the dataset by selecting only the ID and the Group columns and order it by ID.
```{r ex12_4}
#Fig3.4(d)
Randomized %>%
select(ID, Group) %>%
arrange(ID)
```
You can do this for many Groups and for as many observations as you want.
### Section 3.3: Sampling Design
Turn to page 192. We want to get a simple random sample (SRS) of 25 countries out of 189.
```{r ex3.23_1, message=FALSE}
Countries <- read_csv("https://nhorton.people.amherst.edu/ips9/data/chapter03/EG03-23TTS.csv")
```
In R you just have to use the `sample()` function.
```{r}
#Fig3.7
SampleCountries <- sample(Countries, size = 25)
SampleCountries
```
### Section 3.4: Ethics