--- title: "Access Wideband Audiology Immitance database using R and dplyr (Voss PI)" author: "Nicholas Horton (nhorton@amherst.edu)" date: "June 8, 2021" output: pdf_document: fig_height: 4 fig_width: 6 html_document: fig_height: 3 fig_width: 5 word_document: fig_height: 3 fig_width: 5 --- ```{r, include=FALSE} # Don't delete this chunk if you are using the mosaic package # This loads the mosaic and dplyr packages library(mosaic) library(tidyverse) library(RMySQL) ``` ```{r, include=FALSE} # Some customization. You can alter or delete as desired (if you know what you are doing). # This changes the default colors in lattice plots. trellis.par.set(theme=theme.mosaic()) # knitr settings to control how R chunks work. require(knitr) opts_chunk$set( tidy=FALSE, # display code as typed size="small" # slightly smaller font for code ) ``` ## Introduction This document is intended to describe how to access data from a MySQL database using R. It utilizes a database of wideband acoustic immitance variables from humans with normal hearing (see https://projectreporter.nih.gov/project_info_description.cfm?aid=8769352&icde=30039221&ddparam=&ddvalue=&ddsub=&cr=10&csb=default&cs=ASC for more details). A relevant paper on the topic of data management and databases in R can be found at http://chance.amstat.org/2015/04/setting-the-stage. ## Accessing data from a database using SQL commands First I will demonstrate how to access data using SQL (structured query language) commands and the `dbGetQuery()` function. We begin by setting up a connection to the database. ```{r} library(mosaic) library(tidyverse) library(RMySQL) con <- dbConnect(MySQL(), host = "scidb.smith.edu", user = "waiuser", password = "smith_waiDB", dbname = "wai") ``` Next a series of SQL queries can be sent to the database. These return R dataframes. ```{r} dbGetQuery(con, "SHOW TABLES") dbGetQuery(con, "EXPLAIN PI_Info") dbGetQuery(con, "EXPLAIN Subjects") dbGetQuery(con, "EXPLAIN Measurements") ds <- dbGetQuery(con, "SELECT * from Measurements LIMIT 10") ds ``` ## Accessing a database using dplyr commands Alternatively, a connection can be made to the server by creating a series of dplyr table objects. ```{r} db <- DBI::dbConnect(RMySQL::MySQL(), dbname = "wai", host = "scidb.smith.edu", user = "waiuser", password="smith_waiDB") Measurements <- tbl(db, "Measurements") PI_Info <- tbl(db, "PI_Info") Subjects <- tbl(db, "Subjects") ``` #### Let's explore the `PI_Info` table. ```{r} PI_Info %>% collect() %>% summarise(total = n()) PI_Info %>% collect() %>% data.frame() # collect() is a bad idea when dealing with large tables! ``` #### Let's explore the `Subjects` table. ```{r} Subjects %>% collect() # be careful with collect() with large tables! ``` #### Let's explore the `Measurements` table. ```{r error = TRUE} Measurements %>% summarise(total = n()) mtcars %>% summarise(total = n()) ``` #### Let's download the data from a given subject ```{r} onesubj <- Measurements %>% filter(Identifier == "Rosowski_2012", SubjectNumber == 3) %>% collect %>% mutate(SessionNum = as.factor(Session)) head(onesubj) ``` Finally we can plot the results ```{r eval=TRUE} ggplot(data = onesubj, aes(x = Frequency, y = Absorbance)) + geom_point() + aes(colour = Ear) + scale_x_log10() + labs(title="Absorbance by ear Rosowski subject 3") ```