Here is a partial list of talks in the Amherst College SDS Colloquium Series.
Casey Pattanayak (Wellesley College)
Bonnie Ray, IBM
As one of the largest IT companies in the world, IBM has been at the forefront of systems, software, and physical and mathematical sciences research for many decades. However, only in the last ten years has it begun to systematically apply advanced statistics and optimization capabilities to support its internal business decision making. In this talk, I’ll provide examples of how the Business Analytics and Math Sciences organization within IBM Research has partnered with other IBM divisions to enable data-driven business transformation. For one example, that of predicting performance of new business initiatives, I’ll discuss two different statistical approaches investigated, Regression Trees and Nearest Neighbors with Metric Learning, and discuss their performance from both a technical and a business perspective. I’ll also highlight some open problems with potential for future research.
Casey Pattanayak (Wellesley College)
Lisa Dierker (Wesleyan University)
Kurt Bryan (Rose-Hulman Institute of Technology)
Taylor Arnold, AT&T Laboratories
A Bernoulli distribution describes a process which has only two outcomes. Despite the simplicity, it is used in a wide array of applications including a simple coin toss, the outcome of a sporting event, weather models, and the winner of an election. A careful analysis of the Bernoulli distribution also raises a number of theoretical questions leading to topics such as Bayesian inference, minimaxity and estimator theory. Fortunately, the relatively uncomplicated nature of the Bernoulli model allows such questions to be studied without the advanced mathematical machinery required for a more general treatment. This talk will explore these practical and theoretical considerations, with a focus towards the `big questions’ which continue to guide modern statistical research. Two real datasets, from baseball and medicine, will be used throughout to guide the discussion.
Sarah Anoke, Department of Biostatistics, Harvard T.H. Chan School of Public Health
Abstract: Causal inference is a field of statistics focused on
measuring a particular type of relationship between two variables.
Referring to these two variables as the
treatment’ and the
outcome’, we consider the value that an
individual’s outcome would take if the treatment was present, and the
value that the individual’s outcome would take if the treatment was
absent. The difference in these two potential outcomes is the treatment
effect. Every individual has their own individual treatment effect
(ITE). But because only one of these two potential outcomes is
observable, ITEs cannot be estimated from observed data. To overcome
this problem, the average outcome among a group of individuals unexposed
to treatment is subtracted from the average outcome among a group of
individuals exposed to treatment, yielding an average treatment effect
(ATE). It is of interest to identify subgroups for which the
subgroup-specific ATE is very different from the overall ATE. Knowing
the overall ATE is arguably misleading; we would prefer to know that the
drug has no effect within women but a dramatic effect within men. How
then, can the data tell us which subgroups respond particularly well or
poorly to treatment, without advance knowledge of these subgroups?
Emily Silverman (Division of Migratory Bird Management, U.S. Fish & Wildlife Service)
Federal management of migratory birds began 100 years ago, when the United States signed the 1916 Convention for the Protection of Migratory Birds with Great Britain (for Canada). These protections were codified in the Migratory Bird Treaty Act (MBTA) of 1918, which now covers over 800 species of birds, and stands as one of the earliest U.S. environmental laws. The evolution of management approaches since the MBTA has led to the development of monitoring programs and quantitative methods in wildlife science. I will present the history of bird monitoring and statistical methods for population assessment and will discuss new approaches, challenges, and how a solid understanding of statistical concepts is essential for informed management. Drawing on examples from my own work, I will highlight the interdisciplinary skills needed to operate effectively as a scientist and statistician in a resource management agency. As our ability to collect information about the natural world expands in an increasingly digital world, the need for innovative, technically-adept wildlife scientists is expanding.
Andreas Georgiou (Amherst College)
Xihong Lin (Harvard University): this talk was also part of the Connecticut Valley Colloquium Series
The human genome project in conjunction with the rapid advance of high throughput technology has transformed the landscape of health science research. The genetic and genomic era provides an unprecedented promise of understanding genetic underpinnings of complex diseases or traits, studying gene-environment interactions, predicting disease risk, and improving prevention and intervention, and advancing precision medicine. A large number of genome-wide association studies conducted in the last ten years have identified over 1,000 common genetic variants that are associated with many complex diseases and traits. Massive targeted, whole exome and whole genome sequencing data as well as different types of -omics data have become rapidly available in the last few years. These massive genetic and genomic data present many exciting opportunities as well as challenges in data analysis and result interpretation. They also call for more interdisciplinary knowledge and research, e.g., in statistics, machine learning, data curation, molecular biology, genetic epidemiology and clinical science. In this talk, I will discuss analysis strategies for some of these challenges, including rare variant analysis of whole-genome sequencing association studies; analysis of multiple phenotypes (pleiotropy), and integrative analysis of different types of genetic and genomic data.
Aaron Coburn (Amherst College)
Rob Kass (Carnegie Mellon University)
Jessica Utts (University of California, Irvine)
Valeria Espinosa (Google)
John Staudenmeyer (University of Massachusetts, Amherst)
Andreas Georgiou (Amherst College)
Joseph Hogan (Brown University)
Patricia Ordonez (University of Puerto Rico Río Piedras)
Laura Balzer (University of Massachusetts, Amherst)
Matthew Rattigan (University of Massachusetts, Amherst)
In recent years, presidential campaigns have become increasingly quantitative in nature. Once dominated by a small group of backroom strategists making gut decisions, modern campaigns have become increasingly reliant on data-backed decision support. Over the past two decades, this “moneyball-ization” of politics has transformed the way campaigns are run and how resources are allocated. In this talk, I will describe my experiences working for the Analytics Department of Obama For America during the 2012 election cycle. As a digital analyst, I worked alongside political scientists, statisticians, and physicists on problems ranging from social media analytics to quantifying the effects of communications and messaging. In addition, I’ll touch upon some of the privacy issues brought up in the 2016 election cycle.
Krista Gile (University of Massachusetts, Amherst)
Respondent-Driven Sampling is type of link-tracing network sampling used to study hard-to-reach populations. Beginning with a convenience sample, each person sampled is given 2-3 uniquely identified coupons to distribute to other members of the target population, making them eligible for enrollment in the study. This is effective at collecting large diverse samples from many populations. Due to the complexity of the sampling process, inference for the most fundamental of population features: population proportion, is challenging, and has been the subject of much work in recent years, typically using only data on local network size and the variable of interest. This talk focuses on work that considers inferential goals addressed using multiple variables measured on participants. We describe using data on local network composition for a variable biasing recruitment to adjust for preferential recruitment, semi-parametric testing for bivariate associations in the RDS dataset, and methods for clustering RDS participants based on covariate and referral data.
Matteo Riondato (incoming faculty in CS at Amherst College)
In this talk, I describe the field of Data Mining (DM) from the point of view of a researcher in this discipline. Starting from my definition of DM, I give examples of DM tasks for different kinds of data, commenting on available systems for DM and discussing the algorithmic challenges in DM. I show how my research tackles some of these challenges and list the interesting questions I plan to answer in the near future with the help of Amherst students.
Talks in the Statistics and Data Science Colloquium are free and open to the public. They are intended to be accessible to a broad audience with some background in statistics and data science. Junior and senior statistics majors are expected to attend talks in the SDS Colloquia. Please reach out to Professor Nicholas Horton in case of conflicts.
Seeley Mudd Hall is located at the southwest corner of the first year Quadrangle (31 Quadrangle Drive). Paid parking is available at the Amherst Town Common and Boltwood Drive (approximately 8 minute walk). PVTA Bus Service is available from the Converse Hall stop.
Copyright © 2025 Amherst College. All rights reserved.