Modern Statistical Methods and Applications that Enrich Society

Maryclare Griffin Organizer
 
Monday, Aug 4: 8:30 AM - 10:20 AM
0853 
Topic-Contributed Paper Session 
Music City Center 
Room: CC-103C 

Applied

Yes

Main Sponsor

Social Statistics Section

Co Sponsors

ENAR
Justice Equity Diversity and Inclusion Outreach Group

Presentations

A Statistical Study of Alternative Representational Systems for the LA City Council

The 2022 Los Angeles City Council scandal heightened public demand for governance reform, prompting the formation of the Ad Hoc Committee on City Governance and proposals from civic and academic organizations. Key recommendations included establishing an Independent Redistricting Commission, expanding the City Council, and adopting alternative electoral systems such as multimember districts and ranked-choice voting.

In this project, we provide a rigorous, data-driven approach to evaluate these proposals, focusing on their effects on proportionality, racial representation, and electoral responsiveness. We draw on techniques from both Statistics and Computer Science, including Bayesian ethnicity imputation, ecological inference, and modern graph sampling algorithms for exploring the space of district boundaries. Our hybrid approach enables deeper insights into the political geography of Los Angeles, and the challenges in establishing a fair and representative City Council. By providing empirical evidence on the strengths and weaknesses of various districting systems, our work aims to inform policymaking and advance democratic representation in Los Angeles. 

Co-Author

Sarah Cannon, Claremont McKenna College

Speaker

Evan Rosenman

Estimating Opioid Misuse in Subpopulations: A Bayesian Factor Analysis Approach Using Capture-Recapture and Linked Health Data

The ongoing opioid crisis has highlighted the urgent need for accurate surveillance systems to monitor substance misuse and inform public health interventions. However, fragmented and incomplete data sources hinder reliable estimation of disease burden, particularly among key subpopulations. We propose a Bayesian hierarchical factor analysis framework to estimate subpopulation-specific prevalence by jointly modeling their interaction with multiple administrative health data sources within a capture-recapture (CRC) framework. The model accounts for group-specific detection probabilities, referral relationships among data sources, and latent heterogeneity in healthcare-seeking behavior. These detection processes are embedded within a higher-level model for underlying prevalence. Simulation studies show that our approach improves estimation efficiency, especially for small subgroups, and resolves the model-fitting issues (e.g., zero cells) often encountered in stratified CRC methods. Applying the method to the Massachusetts Public Health Data Warehouse (MA PHD), we demonstrate that it provides more stable and interpretable estimates than conventional stratified log-linear CRC approaches, particularly when detection probabilities across sources are similar. This flexible and robust framework enables the use of linked administrative data to identify high-risk populations and supports the development of more efficient public health strategies. 

Speaker

Jianing Wang, Massachusetts General Hospital

Extending respondent-driven sampling to allow modeling of social networks with application to people who inject drugs

Respondent-driven Sampling (RDS) is often used to sample hard-to-reach human populations, especially those at risk for transmissible disease such as HIV and HCV. RDS is conducted by collecting samples over the social network, leaving a tantalizing trace of the social network in the dataset, and begging the question of whether this incidental network information can be used to make inference about the underlying social network that might relate to the transmission of infection. A key limitation of this pursuit is that the RDS network information is structurally limited to tree-structured data – there are no cross-ties and no way to infer endogenous clustering, a key component of disease transmission. In this study we introduce the augmentation of RDS data with the distribution of tokens to provide a sample of cross-ties and introduce a method to use these data to make inference to the underlying social network. 

Speaker

Krista Gile, University of Massachusetts Amherst

Integrating multilevel data to assess Massachusetts food vulnerability

Food insecurity, which includes lack of consistent access to enough food, or reduced quality, variety, and desirability of diet, is a pressing issue in the United States. Data on local food insecurity is crucial to identifying locations with high food insecurity and formulating interventions. However, due to insufficient individual data at county level, current food insecurity estimates are only available at the state level, thus cannot reflect heterogeneities within a state. We present a methodology to integrate multilevel data to estimate food insecurity at a more granular level. We use individual data to estimate food insecurity based on household characteristics. We further estimate the distribution of household characteristics within a county by combining marginal data at the county level with dependency structure at individual level. The first two steps are combined to obtain a county-level food insecurity estimate. We illustrate the method through Massachusetts as a case study. This methodology can be applied to estimations of other quantities, e.g., household food budget, which facilitates a more comprehensive view of local food affordability. 

Keywords

probabilistic graphical model

data integration

iterative proportional fitting 

Co-Author

Chaitra Gopalappa, University of Massachusetts Amherst

Speaker

QIAN ZHAO, University of Massachusetts

Forecasting and Nowcasting with Genomic Data at the Regional Level: Bayesian Approaches for Limited Data

After the SARS-COV-2 (COVID-19) pandemic, creating accurate models for forecasting and nowcasting viral pathogens has taken on an increased importance. In particular, it is important to understand the circulation of different variants, because these variants can put different strains on public health resources. The pandemic also showed that it is often impossible to obtain complete real-time data to inform these models, especially for models attempting to predict at resolutions below the country level. This underscores the need for models that can produce accurate forecasts with limited data. Abousamra et al. (2024) examined forecasting SARS-COV-2 variants at the national level; this paper builds on that work by discussing an extension of multinomial logistic regression (MLR) to account for the reduction in data at the state level. The extension uses a hierarchical structure to leverage information from states with more data to inform states with less data and performs better in testing than a comparable baseline in terms of the energy score, a proper scoring rule for probabilistic distributions. 

Speaker

Isaac MacArthur