Innovative Data Science and Statistical Methods for Small Domain Estimation and Dissemination

Jennifer Parker Chair
University of Maryland, College Park
 
Kristen Olson Discussant
University of Nebraska-Lincoln
 
Morgan Earp Organizer
National Center for Health Statistics
 
Sunday, Aug 3: 4:00 PM - 5:50 PM
0235 
Invited Paper Session 
Music City Center 
Room: CC-104D 

Applied

Yes

Main Sponsor

Survey Research Methods Section

Co Sponsors

Government Statistics Section
Social Statistics Section

Presentations

Bayesian Unit-level Small Area Estimation Modeling of Longitudinal Survey Data under Informative Sampling

Unit-level models, which model survey responses directly, offer a number of advantages over area-level models, which model aggregated estimates. However, accounting for a complex survey design becomes more challenging in the unit-level setting. In particular, little work has been done to extend such models to capture longitudinal designs, where temporal correlation exists at both the response and domain level. We consider a Bayesian hierarchical unit-level, model-based approach that handles Gaussian, binary, and categorical data, incorporates longitudinal dependence and multiscale time series structure, and accounts for informative sampling. To handle computational scalability, we develop an efficient Gibbs sampler with appropriate data augmentation. An empirical simulation study is conducted to compare the proposed approach to models that do not account for unit-level longitudinal correlation. Finally, using public-use microdata, we provide an analysis of the Household Pulse Survey that compares both design-based and model-based estimators and demonstrates superior performance for the proposed approaches. 

Keywords

Bayesian

Informative Sampling

Longitudinal

Small Area Estimation

Unit-level 

Co-Author(s)

Daniel Vedensky, University of Missouri
Paul Parker, University of California Santa Cruz
Scott Holan, University of Missouri/U.S. Census Bureau

Speaker

Scott Holan, University of Missouri/U.S. Census Bureau

Enhancing Dissemination of Health Estimates for Small Domains – NCHS

Due to smaller sample sizes and/or a lack of statistical reliability, some estimates for small domains (subpopulations) cannot be disseminated at NCHS. As a result, estimates may be suppressed or data may be aggregated across domains or time to produce more reliable estimates, which can mask potential differences in outcomes for some groups. Many approaches are available to improve the reliability of estimates for small domains and subsequently increase the number of estimates that can be disseminated. This presentation will describe some approaches used at NCHS to produce more reliable estimates for small subgroups of interest. These approaches include a new tool for small domain estimation (the enhanced modified Kalman filter) and using statistical learning methods to incorporate data from nonprobability surveys that may include oversamples of specific subpopulations. These model-based estimates can fill important data gaps for subpopulations of interest and improve dissemination. However, there are various challenges and limitations that are important to acknowledge, including (but not limited to): the bias-variance tradeoff; potential correlations between the selection probabilities for a nonprobability sample and the outcome variable(s) of interest; and when there are limited shared covariates across data sources to use in various data integration models.  

Co-Author(s)

Lauren Rossen, National Center for Health Statistics
Makram Talih, NCHS

Speaker

Katherine Irimata, National Center for Health Statistics