Data Integration in Non-probability Sampling and Small Area Estimation for Official Statistics

Abstract Number:

1717 

Submission Type:

Topic-Contributed Paper Session 

Participants:

Sanjay Chaudhuri (1), Sanjay Chaudhuri (1), Snigdhansu Chatterjee (2), Andreea Erciulescu (3), Julie Gershunskaya (4), Eric Slud (5), Thuan Nguyen (6)

Institutions:

(1) National University of Singapore, N/A, (2) University of Minnesota, N/A, (3) Westat, N/A, (4) US Bureau of Labor Statistics, N/A, (5) US Census Bureau, N/A, (6) OHSU-PSU School of Public Health, N/A

Chair:

Sanjay Chaudhuri  
National University of Singapore

Session Organizer:

Sanjay Chaudhuri  
National University of Singapore

Speaker(s):

Snigdhansu Chatterjee  
University of Minnesota
Andreea Erciulescu  
Westat
Julie Gershunskaya  
US Bureau of Labor Statistics
Eric Slud  
US Census Bureau
Thuan Nguyen  
OHSU-PSU School of Public Health

Session Description:

Model-based information integration techniques have been used to produce official statistics for a long time. Small area estimation methods have been found extremely useful in predicting crucial socio-economic parameters e.g. poverty prevalence, health indicators, crop yield etc. among smaller sub-populations not abundantly reached by large-scale surveys. Recent advances in information technology have made it possible to collect a massive amount of data, both in size and dimensions easily and cheaply. Analysis of these complex, often observational non-probability sampled datasets is an emerging topic in fields like public health, biostatistics, econometrics, etc.

This session will focus on recent developments for model-based information integration techniques in small-area estimation and non-probability sampling. Model-based data augmentation methods allow information integration between surveys of various sizes, covering geographical areas of different sizes (e.g. states, counties) with data collected on different variables of interest. This leads to better small area-level predictors, which can be used in more efficient resource allocation, planning interventions, and information publication. Non-probability sampled data sets usually don't represent the target population. This is corrected by estimating unknown sampling probabilities or propensity scores using information from separate probability samples which provides inexpensive, accurate parameter estimates.

Sanjay Chaudhuri from the Department of Statistics, University of Nebraska-Lincoln will chair this five-speaker session. Dr. Andreea Erciulescu would present an extremely innovative model-based method to combine the county-level BRFSS data with county-level data from auxiliary sources while accounting for multi-sourced error and nested geographical levels. The goal is to predict the crucial indicator of county-level health prevalence of having a personal doctor. Prof. Snigdhansu Chatterjee will present a ground-breaking development that couples modern machine learning and big data techniques with non-probability sampling, and official data. As opposed to the current paradigm of modeling nonprobability samples using classical statistical ideas and using ML mostly as an algorithmic tool (e.g. logistic regression), this talk will couple statistical models with ML-based estimation and inference. Dr. Julie Gershunskaya will discuss the theoretical properties of various estimators of non-probability survey participation probabilities and population parameters from non-probability samples in detail. Such methods involve complex ways of information integration from non-probability and reference samples. Prof. Eric Slud from UMD and the US Census Bureau will introduce a novel smoothly varying metric to assess benchmarked, time series-based survey weight adjustment methods and discuss its properties. The metric will be illustrated with several different proposed weighting schemes for ACS estimates for the years 2018 through 2021. Dr. Thuan Ngyuen from OHSU will discuss her recent path-breaking work on assessing uncertainty for classified mixed model prediction.

This session will appeal to a wide range of JSM participants. Researchers in ML, DREI research, poverty mapping, SAE, survey methods, causal inference, and participants seeking sound statistical theory and methods for the production and dissemination of data at all levels of government will be highly interested.

Sponsors:

Government Statistics Section 1
International Indian Statistical Association 2
Survey Research Methods Section 3

Theme: Statistics and Data Science: Informing Policy and Countering Misinformation

Yes

Applied

Yes

Estimated Audience Size

Medium (80-150)

I have read and understand that JSM participants must abide by the Participant Guidelines.

Yes

I understand and have communicated to my proposed speakers that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is nonrefundable.

I understand