Data Integration in Non-probability Sampling and Small Area Estimation for Official Statistics

Sanjay Chaudhuri Chair
National University of Singapore
 
Sanjay Chaudhuri Organizer
National University of Singapore
 
Tuesday, Aug 6: 2:00 PM - 3:50 PM
1717 
Topic-Contributed Paper Session 
Oregon Convention Center 
Room: CC-F151 

Applied

Yes

Main Sponsor

Government Statistics Section

Co Sponsors

International Indian Statistical Association
Survey Research Methods Section

Presentations

Hierarchical Bayes small area estimation for county‑level health prevalence to having a personal doctor

The complexity of survey data and the availability of data from auxiliary sources motivate researchers to explore estimation methods that extend beyond traditional survey-based estimation. The U.S. Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System (BRFSS) collects a wide range of health information. While the BRFSS focuses on state-level estimation, there is demand for county-level estimation of health indicators using BRFSS data. A hierarchical Bayes small area estimation model is developed to combine county-level BRFSS survey data with county-level data from auxiliary sources, while accounting for various sources of error and nested geographical levels. To mitigate extreme proportions and unstable survey variances, a transformation is applied to the survey data. Model-based county-level predictions are constructed for prevalence of having a personal doctor for all the counties in the U.S., including those where BRFSS survey data were not available. An evaluation study using only the counties with large BRFSS sample sizes to fit the model versus using all the counties with BRFSS data to fit the model is also presented. 

Speaker

Thomas Krenzke, Westat

Accurate Bayesian prediction in small area using samples with selection bias

The small-area framework we consider includes instances where data is available from probability-based surveys as well as non-probability samples, and where the sample sizes from these two groups may be extremely imbalanced. We present a Bayesian algorithm and some alternatives for small-area prediction in this context. Several technical and algorithmic advancements related to the proposed technique lead to considerable broadening of the scope of using data with selection bias. We present theoretical advancements as well as results from numeric studies.
 

Speaker

Snigdhansu Chatterjee, University of Minnesota

Comparison of recent methods for combining probability and non-probability samples

Recent proliferation of computers and the internet has opened new opportunities for collecting and processing data. However, such data are often obtained without a well-planned probability survey design. Such non-probability based samples cannot be automatically regarded as representative of the population of interest. Several methods for estimation and inferences from non-probability samples have been developed in recent years. The methods assume that non-probability sample selection is governed by an underlying latent random mechanism. The basic idea is to use information collected from a probability ("reference") sample to uncover latent non-probability survey participation probabilities (also known as "propensity scores") and use them in estimation of target finite population parameters. In this paper, we review several recently developed methods for estimation of non-probability survey participation probabilities. We compare theoretical properties of recently published methods to estimate survey participation probabilities and study their relative performances in simulations. 

Speaker

Julie Gershunskaya, US Bureau of Labor Statistics

Assessment of Effectiveness of Weighting Adjustment using Short Time Series of Survey Estimates at Multiple Geographic Levels

Motivated by Census Bureau research on re-weighting of American Community Survey 1-year estimates, this talk considers evaluation of effectiveness of weighting-adjustment based on short time series of successive weighted 1-year estimates of selected outcome variables made at national, state and county level. The variables to which this method is applied must be carefully selected to be stable and smoothly varying in the population, at each geographic level, from external subject-matter knowledge. The talk will show how to build an evaluation metric from this idea and to establish mathematical properties of the metric under ideal conditions of correct and misspecified weighting, and will illustrate the metric for several different proposed weighting schemes for ACS estimates for the years 2018 through 2021. 

Speaker

Eric Slud, US Census Bureau

Assessing Uncertainty for Classified Mixed Model Prediction

Classified mixed model prediction (CMMP) is a new method that has embedded the traditional mixed model prediction (MMP) with a modern flavor. In this work, we consider estimation of the mean squared prediction error (MSPE) of CMMP. A recently proposed Sumca method is implemented. Sumca combines analytic and Monte-Carlo approaches, leading to a second-order unbiased estimator of the MSPE. Performance of Sumca is investigated via simulation studies, and comparisons are made with alternative methods. The simulation study shows that a brute-force bootstrap method performs almost as well as Sumca, while a naive approach and a Prasad-Rao estimator at the matched index are significantly inferior to Sumca. A real-data application is considered. This work is joint with Jiming Jiang of the University of California, Davis, USA and J. Sunil Rao of the University of Miami, USA. 

Speaker

Thuan Nguyen, OHSU