--- title: "Sample homework" author: "YOUR NAME GOES HERE" date: "June 17, 2020" output: pdf_document: fig_height: 3.1 fig_width: 7 html_document: fig_height: 3 fig_width: 5 word_document: fig_height: 3 fig_width: 5 --- ```{r include=FALSE} # Don't delete this chunk if you are using the mosaic package library(mosaic) library(knitr) opts_chunk$set( tidy=FALSE, # display code as typed size="small" # slightly smaller font for code ) ``` This homework would usually be submitted via gradescope prior to the start of class. #### PROBLEMS TO TURN IN #### HELPrct Physical Component Scores a) Using data from the `HELPrct` study, describe the distribution of the `pcs` scores for male subjects who reported being housed on the `homeless` variable (hint: use the `filter()` command to generate the appropriate subset of the data). Be sure to describe the shape, center, and spread and include a single graphical display. (Hint: use the `df_stats()` function.) SOLUTION: ```{r} smallds <- HELPrct %>% select(pcs, homeless, sex) glimpse(smallds) tally(~ homeless, data = smallds) tally(~ sex, data = smallds) ``` b) Calculate and interpret a 90% confidence interval for the population mean PCS score for male subjects with stable housing. (Hint, use the `t.test()` function.) ```{r} ``` SOLUTION: \newpage #### IS5 20.22 The 2013 World Drug Report investigated the prevalence of drug use as a percentage of the population aged 15 to 64. Data from 32 European countries are included in the dataset. ```{r message = FALSE, warning = FALSE} Druguse <- readr::read_csv("https://github.com/nicholasjhorton/SDM4inR/raw/master/data/Drug_use_2013.csv") %>% janitor::clean_names() %>% rename(cannabis = canabis) %>% na.omit() names(Druguse) gf_point(cocaine ~ cannabis, data = Druguse) %>% gf_lm() ``` Here we are exploring the relationship between cannabis use (as a percentage of the population) and cocaine use (as a percentage of the population). a) Explain what the regression says (be sure to give a full report). SOLUTION: ```{r} mod <- lm(cocaine ~ cannabis, data = Druguse) modplus <- broom::augment(mod) names(modplus) coef(mod) ``` b) State the hypothesis about the slope (both numerically and in words) that describes how use of marijuana is associated with other drugs. SOLUTION: c) Generate a scatterplot of the residuals as a function of the fitted values with a superimposed smoother. Is the assumption of linearity satisfied here? (Hint: this can be done in two ways, `mplot(mod, which = 1)` or through use of `gf_point()`.) SOLUTION: d) Assuming that the other assumptions for inference are satisfied, perform the hypothesis test and state your conclusion in context. (Be sure to report the 95% confidence interval as well as the p-value). SOLUTION: ```{r} msummary(mod) confint(mod) ``` e) Report the R-squared value and explain what R-squared means in context. SOLUTION: ```{r} rsquared(mod) ``` f) Do these results indicate that marijuana use leads to the use of harder drugs? Explain. SOLUTION: g) Which country has the largest negative residual? The largest positive residual? (Hint: use `arrange()` in conjunction with `head()` and `tail()`. SOLUTION: ```{r} ```