--- title: "NYC flights data wrangling example" author: "Nicholas Horton (nhorton@amherst.edu)" date: "June 14, 2021" output: pdf_document: fig_height: 7 fig_width: 8 html_document: fig_height: 3 fig_width: 5 word_document: fig_height: 3 fig_width: 5 --- ```{r, include=FALSE} library(tidyverse) library(knitr) opts_chunk$set( tidy=FALSE, # display code as typed size="small" # slightly smaller font for code ) ``` ## Introduction This document is intended to provide you an opportunity to practice data wrangling using the `tidyverse`. Let's begin by exploring the dataset of flights departing from New York City (NYC) airports in the year 2013. ```{r message = FALSE} library(tidyverse) library(nycflights13) ``` ## Familiarizing ourselves with the dataset What variables are included in the `flights` dataset? How many rows are there? ```{r} glimpse(flights) ``` SOLUTION: What variables are included in the `airports` dataset? How many rows are there? ```{r} ``` SOLUTION: Which variables are included in the `airlines` dataset? How many rows are there? ```{r} ``` SOLUTION: ## Focusing on Atlanta Let's focus on flights from NYC area airports to Atlanta GA (FAA code ATL). Create a new object `atlanta` that includes only these flights (hint: use `filter()`). How many flights to Atlanta were there in 2013? ```{r} atlanta <- flights %>% filter(dest == "ATL") glimpse(atlanta) ``` SOLUTION: ## Seasonality Is there a difference in the number of flights per month? Summarize the number of flights for each month and provide a sorted list with the months with the most flights first (hint: use `group_by()` in combination with `summarize()`). ```{r} atlanta %>% group_by(month) %>% summarize(num_flights = n()) %>% arrange(desc(num_flights)) ``` The most flights occurred during ... ## Which airlines Which airlines flew to Atlanta and how often? (Hint: use `left_join()` to make the carrier name more descriptive.) ```{r} ``` ## Which airports Which airports had flights that flew to Atlanta and how often? (Hint: use `left_join()` to make the airport name more descriptive.) ```{r} ``` SOLUTION: ## Flight delays What is the average arrival delay (in minutes) by airline? (Hint: many flights are missing this information, so you will need to include the `na.rm = TRUE` option to the call to the `mean()` function.) ```{r} ``` SOLUTION: ## Challenge Can you find an interesting insight about flights from NYC area airports to Atlanta during 2013?