Teaching Statistics using R and RStudio (Skidmore workshop, May 2015)
Spring ASA Chapter Workshop: Teaching Statistics using R and RStudio, presented by Nicholas Horton, Amherst College
Course description
R is a freely available language and environment for statistical computing and graphics that has become popular in academia and in many industries. But can it be used with students? This mini-course will introduce participants to teaching applied statistics courses using computing in an integrated way. The presenter has been using R to teach statistics to undergraduates at all levels for the last decade and will share his approach and favorite examples. Topics will include workflow in the RStudio environment, providing novices with a powerful but manageable set of tools, data visualization, basic statistical inference using R, and resampling. Much of this will be facilitated using the mosaic package. The short-course is designed to be accessible to those with little or no experience teaching with R, but with some prior exposure to R. The short-course is intended to provide participants with skills, examples, and resources that they can use in their own teaching.Workshop materials
We use the mosaic package extensively to simplify R for students while still providing a powerful set of tools. The package is freely available on CRAN, the Comprehensive R Archive network. Vignettes: Each participant will receive a printed copy of the following package vignettes (these and others are also available from within the mosaic package or at the CRAN mosaic package webpage).- Start Teaching Statistics using R
- A Compendium of Commands to Teach Statistics with R
- Less Volume, More Creativity: Getting started with mosaic (HTML slides introducing the mosaic package)
- Downloading instructions
- Workshop slides #1
- Workshop slides #2
- Open Intro Statistics mosaic labs
Handouts, Quizzes, etc.
- confound.Rmd
- Logistics of teaching with R
- R Quiz [source][pdf]
- R Guide [source][pdf]
- R Markdown Prezi
- Nick's R Resources page
Book companions in mosaic
We have put together drafts of companion volumes for several Intro Stats
books. Each shows how to use the mosaic package to create all the figures and analyses in the examples.
Related slides and papers
- "Setting the stage for data science: integration of data management skills in introductory and second courses in statistics",
Nicholas J. Horton, Benjamin S. Baumer, and Hadley Wickham (CHANCE, 2015), 28(2):40-50, http://arxiv.org/abs/1502.00318, full-text
- "Teaching precursors to data science in introductory and second courses in statistics",
Nicholas J. Horton, Benjamin S. Baumer, and Hadley Wickham (2014), http://arxiv.org/abs/1401.3269 plus slides
- "R Markdown: integrating a reproducible analysis tool into introductory statistics",
Benjamin S. Baumer, Mine Cetinkaya-Rundel, Andrew Bray, Linda Loi, and Nicholas J. Horton
(TISE, 2014), http://arxiv.org/abs/1402.1894
- "Data science in the statistics curricula: preparing students to 'think with data'",
Johanna Hardin, Roger Hoerl, Nicholas J. Horton, and Deborah Nolan (2014),
http://arxiv.org/abs/1410.3127,
syllabi, activities, and related resources
- "Challenges and opportunities for statistics and statistical education: looking back, looking
forward", Nicholas J. Horton (TAS, 2015), http://arxiv.org/abs/1503.02188
- Data wrangling, visualization, R Markdown, and Shiny cheat sheets
- Visualizing data manipulation operations (Shiny)
- Second edition of Using R for Data Management, Statistical Analysis, and Graphics, Nicholas J. Horton and Ken Kleinman (2015)
- Slides and recording from February 24, 2015 CAUSE webinar
Airline delays examples
- Airline delays example files (nycflights13.pdf,
nycflights13.Rmd) using the nycflights13 package in R
- update R and RStudio to recent versions
- run update.packages()
- run install.packages(c("mosaic", "nycflights13"))
- run download.file("http://www.amherst.edu/~nhorton/precursors/nycflights13.Rmd",
"nycflights13.Rmd")
- Airline delays SQL intro slides
- Airline delays example files using small SQLite database (just 2014) using RSQLite and dplyr
- update R and RStudio to recent versions
- run update.packages()
- run install.packages(c("RSQLite", "dplyr", "tidyr", "mosaic", "knitr", "nycflights13", "lubridate", "igraph", "markdown", "maps", "readr"))
- download the following files:
load-sqlite.R,
test-sqlite.Rmd,
airlines.csv,
airplanes.csv,
airports.csv,
2014.csv.bz2 (95MB)
- set up a new project in RStudio specifying the directory/folder that contains the files that you downloaded
- source the script file load-sqlite.R. This should create the database (called ontime.sqlite3) and display information about three airports.
- test the setup by knitting the Markdown file test-sqlite.Rmd in the same directory where you saved the database (this should generate test-sqlite.pdf as output)
- Airline delays example files using large SQLite database (precursors-sqlite.pdf,
precursors-sqlite.Rmd, ran in 30-150 seconds with indices, approximately 1,000 seconds without)
- find a machine with fast internet and lots of disk space (approximately 50GB needed)
- run install.packages(c("RSQLite", "dplyr", "tidyr", "mosaic", "knitr", "nycflights13", "lubridate", "igraph", "markdown", "maps", "readr"))
- download data for 1987-2008 plus supplemental data sources (airlines, airports, airplanes) from the Data Expo 2009 website
- download data from 2009 to today using the following scripts:
1-download.r and
2-reduce.r
- download the following files:
load-sqlite-all.R,
airlines.csv,
airplanes.csv,
and airports.csv
- set up a new project in RStudio specifying the directory/folder that contains the files that you downloaded
- set up the database using the following commands
load-sqlite-all.R
- Database vignette from the dplyr package in R
- update R and RStudio to recent versions
- run update.packages()
- run install.packages(c("mosaic", "nycflights13"))
- run download.file("http://www.amherst.edu/~nhorton/precursors/nycflights13.Rmd", "nycflights13.Rmd")
- update R and RStudio to recent versions
- run update.packages()
- run install.packages(c("RSQLite", "dplyr", "tidyr", "mosaic", "knitr", "nycflights13", "lubridate", "igraph", "markdown", "maps", "readr"))
- download the following files: load-sqlite.R, test-sqlite.Rmd, airlines.csv, airplanes.csv, airports.csv, 2014.csv.bz2 (95MB)
- set up a new project in RStudio specifying the directory/folder that contains the files that you downloaded
- source the script file load-sqlite.R. This should create the database (called ontime.sqlite3) and display information about three airports.
- test the setup by knitting the Markdown file test-sqlite.Rmd in the same directory where you saved the database (this should generate test-sqlite.pdf as output)
- find a machine with fast internet and lots of disk space (approximately 50GB needed)
- run install.packages(c("RSQLite", "dplyr", "tidyr", "mosaic", "knitr", "nycflights13", "lubridate", "igraph", "markdown", "maps", "readr"))
- download data for 1987-2008 plus supplemental data sources (airlines, airports, airplanes) from the Data Expo 2009 website
- download data from 2009 to today using the following scripts: 1-download.r and 2-reduce.r
- download the following files: load-sqlite-all.R, airlines.csv, airplanes.csv, and airports.csv
- set up a new project in RStudio specifying the directory/folder that contains the files that you downloaded
- set up the database using the following commands load-sqlite-all.R
Partial support for this work was provided by the National Science Foundation DUE 0920350 (Project MOSAIC).
Nicholas HortonDepartment of Mathematics and Statistics
Amherst College
AC#2239
PO Box 5000
Amherst, MA 01002-5000
413-542-5655 (voice)