Talks in the Statistics and Data Science Colloquium are free and open to the public. They are intended to be accessible to a broad undergraduate audience with some background in statistics and data science. Junior and senior statistics majors are expected to attend talks in the SDS Colloquia (please reach out to Professor Nicholas Horton in case of conflicts).
Information about talks and and events sponsored by the Amherst College Data Science Initiative can be found here.
The Department also organizes a series of talks for undergraduates: see https://npflueger.github.io/colloquium for details.
Professor of Statistics and Data Science Smith College
Abstract: The talk describes tidychangepoint
, a new R
package for changepoint
detection analysis. tidychangepoint leverages existing packages like
changepoint
, GA
, tsibble
, and
`broom
to provide tidyverse-compliant tools for segmenting
univariate time series using various changepoint detection algorithms.
In addition, tidychangepoint also provides model-fitting procedures for
commonly-used parametric models, tools for computing various penalized
objective functions, and graphical diagnostic displays. tidychangepoint
wraps both deterministic algorithms like PELT, and also flexible,
randomized, genetic algorithms that can be used with any compliant
model-fitting function and any penalized objective function. By bringing
all of these disparate tools together in a cohesive fashion,
tidychangepoint facilitates comparative analysis of changepoint
detection algorithms and models. (This is joint work with Biviana
Marcela Suarez Sierra.)
Bio: Ben Baumer is a data scientist, with research and teaching focused on extracting meaning from data. This interest is informed by both his graduate work, which focused on discrete mathematics and theoretical computer science, and his professional experience, where he served as the Statistical Analyst for the New York Mets from 2004 to 2012. Ben has published a wide variety of papers and textbooks in network science, sports analytics, data science education, and other related fields.
Professor of Statistics and Dean of the College of Liberal Arts and Studies at the University of Connecticut
Abstract: Missing data, an issue frequently encountered in data analysis, causes difficulties with estimation, precision and inference. Methods for dealing with missing data issues have been studied extensively in the last few decades. Two types of missing values can be present in the same dataset. This talk will explore the probabilistic mechanisms generating the two types of missing values, the conditions under which these mechanisms can be partially or completely ignored, and the use of two-stage multiple imputation (MI) to address the challenge posed by incomplete observations.
Bio: Dr. Harel received his doctorate in statistics in 2003 from the Department of Statistics at the Pennsylvania State University; where he developed his methodological expertise in the areas of missing data techniques, diagnostic tests, longitudinal studies, Bayesian methods, sampling techniques, mixture models, latent class analysis, and statistical consulting. Dr. Harel has been involved with a variety of research fields including, but not limited to Alzheimer’s, diabetes, cancer, nutrition, HIV/AIDS, health disparities, anti-racism, and alcohol and drug abuse prevention.
Abstract: In the context of building probabilistic ensemble forecasts, it is important to understand the relative importance and contributions of individual models to creating a highly accurate forecast combination. We propose a practical method for evaluating the expected contribution of individual component models using a variation of the Shapley value, a concept of cooperative game theory. This approach relies on considering all possible ensemble models constructed from subsets of individual models. This study was motivated by studying forecasts submitted to the US COVID-19 Forecast Hub starting in April 2020. This modeling hub produced a probabilistic ensemble forecasting model of COVID-19 cases, hospitalizations, and deaths in the US based on individual models collected from a variety of research groups. We aim to identify which is the most “important” component model on average in helping the ensemble be more accurate. Key results from this work show that (1) the overall importance of an individual model tends to be correlated with the overall prediction accuracy of that model measured by the weighted interval score (WIS), which is a commonly used proper scoring rule for quantile forecasts, and (2) our proposed method clearly shows the contribution of individual models to a more accurate ensemble model, which is difficult to ascertain from the overall WIS alone. This study will offer insights into understanding individual forecasting models’ unique features and their roles in contributing to an ensemble model for a specific prediction task. (This work is jointy with Evan Ray and Nicholas Reich.)
Bio: Minsu Kim is a PhD student in the Department of Biostatistics and Epidemiology at the University of Massachusetts Amherst. Her research interests now include the evaluation of probabilistic forecasting models and the application of ensemble methods for infectious disease prediction. Additionally, she is keenly interested in machine learning and R package development.
Seeley Mudd Hall is located at the southwest corner of the first year Quadrangle (31 Quadrangle Drive). Paid parking is available at the Amherst Town Common and Boltwood Drive (approximately 8 minute walk). PVTA Bus Service is available from the Converse Hall stop (approximately 5 minute walk).
Last updated September 30, 2024
Copyright © 2024 Amherst College. All rights reserved.