---
title: "Sample homework"
author: "YOUR NAME GOES HERE"
date: "June 17, 2020"
output:
  pdf_document:
    fig_height: 3.1
    fig_width: 7
  html_document:
    fig_height: 3
    fig_width: 5
  word_document:
    fig_height: 3
    fig_width: 5
---


```{r include=FALSE}
# Don't delete this chunk if you are using the mosaic package
library(mosaic)
library(knitr)
opts_chunk$set(
  tidy=FALSE,     # display code as typed
  size="small"    # slightly smaller font for code
)
```

This homework would usually be submitted via gradescope prior to the start of class.  

#### PROBLEMS TO TURN IN


#### HELPrct Physical Component Scores

a) Using data from the `HELPrct` study, describe the distribution of the `pcs` scores for male subjects who reported being housed on the `homeless` variable (hint: use the `filter()` command to generate the appropriate subset of the data).  Be sure to describe the shape, center, and spread and include a single graphical display. (Hint: use the `df_stats()` function.)


SOLUTION:

```{r}
smallds <- HELPrct %>%
  select(pcs, homeless, sex)
glimpse(smallds)
tally(~ homeless, data = smallds)
tally(~ sex, data = smallds)
```

b) Calculate and interpret a 90% confidence interval for the population mean PCS score for male subjects with stable housing. (Hint, use the `t.test()` function.)

```{r}

```

SOLUTION:

\newpage


#### IS5 20.22

The 2013 World Drug Report investigated the prevalence of drug use as a percentage of the population aged 15 to 64.  Data from 32 European countries are included in the dataset.


```{r message = FALSE, warning = FALSE}
Druguse <- readr::read_csv("https://github.com/nicholasjhorton/SDM4inR/raw/master/data/Drug_use_2013.csv") %>% 
  janitor::clean_names() %>%
  rename(cannabis = canabis) %>%
  na.omit()
names(Druguse)
gf_point(cocaine ~ cannabis, data = Druguse) %>%
  gf_lm()
```

Here we are exploring the relationship between cannabis use (as a percentage of the population) and cocaine use (as a percentage of the population).


a) Explain what the regression says (be sure to give a full report).

SOLUTION:

```{r}
mod <- lm(cocaine ~ cannabis, data = Druguse)
modplus <- broom::augment(mod)
names(modplus)
coef(mod)
```

b) State the hypothesis about the slope (both numerically and in words) that describes how use of marijuana is associated with other drugs.

SOLUTION:

c) Generate a scatterplot of the residuals as a function of the fitted values with a superimposed smoother.  Is the assumption of linearity satisfied here?  (Hint: this can be done in two ways, `mplot(mod, which = 1)` or through use of `gf_point()`.) 

SOLUTION:


d) Assuming that the other assumptions for inference are satisfied, perform the hypothesis test and state your conclusion in context. (Be sure to report the 95% confidence interval as well as the p-value).

SOLUTION:

```{r}
msummary(mod)
confint(mod)
```

e) Report the R-squared value and explain what R-squared means in context.

SOLUTION:

```{r}
rsquared(mod)
```

f) Do these results indicate that marijuana use leads to the use of harder drugs?  Explain.

SOLUTION:

g) Which country has the largest negative residual?  The largest positive residual?  (Hint: use `arrange()` in conjunction with `head()` and `tail()`.

SOLUTION:

```{r}
```