Leveraging External Data Sources to Improve Federal Government Surveys

Amy Lin Chair
Westat
 
Jean Opsomer Discussant
Westat
 
Gizem Korkmaz Discussant
Westat
 
Minsun Riddles Organizer
Westat
 
Tuesday, Aug 6: 2:00 PM - 3:50 PM
1848 
Topic-Contributed Paper Session 
Oregon Convention Center 
Room: CC-F152 
Government surveys play a critical role in shaping evidence-based policies, fostering informed decision-making, and addressing the evolving needs of the population. However, these surveys are confronted with increasing challenges, including declining response rates. Despite these obstacles, federal statistical agencies are committed to upholding the quality of their data, adapting methodologies, and exploring innovative approaches to ensure the continued effectiveness of this vital tool in policymaking. In an era of data abundance, one strategy to achieve this is through leveraging external data sources to enhance these surveys, aligning with the overarching theme of JSM 2024: 'Statistics and Data Science: Informing policy and countering misinformation.'
This session comprises three presentations. The first two showcase applications of harnessing the power of external data sources to improve efficiency in data collection and reduce nonresponse bias. The third presentation focuses on the evaluation of linked data, assessing its potential to improve efficiency of federal surveys. Two discussants will provide insights into each application, discussing potential challenges and outlining next steps for advancing the effectiveness of federal government surveys in the contemporary data landscape. These discussants bring valuable perspectives to the topic, representing the fields of survey statistics and data science, and offering their expertise in improving the quality of government surveys.

Applied

Yes

Main Sponsor

Survey Research Methods Section

Co Sponsors

Government Statistics Section
Social Statistics Section

Presentations

Age-Eligibility Oversampling to Reduce Screening Costs in a Multimode Survey

Some surveys have a narrow range of eligibility, including age subgroups and special populations such as smokers. It is expensive and inefficient to sample households that do not have any eligible people. The National Survey of Family Growth has a target population of Americans aged 15 through 49 thus we would like to minimize sampling households with people aged 50 and older. In this paper, we discuss a method to oversample households within sampling units that are more likely to be age-eligible.

RTI has an enhanced address frame which includes addresses as well as data from marketing vendors. Using the enhanced frame and historic survey data from a prior, unrelated study, we developed a model to predict whether households have people of the targeted age range. We will discuss the method to build the model, score the model on the sampling frame, and create age-eligibility strata to allocate more of the sample to households with higher likelihood of eligibility. We use data from 2022 to show how the model performed and how we will change the allocation in future years of data collection. 

Co-Author(s)

Stephanie Zimmer, RTI International
Joe McMichael, RTI International
Taylor Lewis

Speaker

Stephanie Zimmer, RTI International

Enhancing Weighting in the National Health and Nutrition Examination Survey (NHANES) with External Data

For many surveys, limited information is available for nonrespondents, which can lead to biased estimates if the nonrespondents have unknown characteristics different from the respondents. Area-level estimates from reliable external sources, such as the American Community Survey, can be utilized in the weighting adjustment process to try to reduce this bias. The National Health and Nutrition Examination Survey (NHANES), like many other government surveys, has had a decline in response rates in recent years. As a result, the weighting process has been reviewed for potential changes that account for nonresponse. To address this need, we expanded the use of auxiliary data (e.g., area-level estimates from the American Community Survey (ACS) and other external sources) and introduced paradata in the form of interviewer observations of sampled households at the first contact attempt. We will show how these are used in both weighting adjustments and the general nonresponse bias assessments for the NHANES August 2021-August 2023 data. 

Co-Author(s)

Minsun Riddles, Westat
Matt Jans, National Center for Health Statistics
Te-Ching Chen, CDC/NCHS

Speaker

Jay Clark, Westat

Improving Survey Efficiency with Linked Data: The Survey of Doctorate Recipients Story

Federal surveys are facing a multitude of challenges including recent hikes in data collection costs and declining response rates. These motivate federal agencies to explore alternative sources to meet emerging demands for new data assets. Strategies for improving survey efficiency include augmenting related data and refining content design. The Survey of Doctorate Recipients (SDR), conducted by the National Center for Science and Engineering Statistics within the National Science Foundation, has a developing data linkage program built around a network of data - including scientific publications, federal awards, patents, and research fundings - linked to eligible respondents of the SDR. The linked data are used in this study to evaluate burdens and quality of self-reported data on scientific productiveness and receipt of federal support as well as to assess the potential of replacing or supplementing parts of survey content with external data. Findings inform the feasibility and limitations of using surveys for collecting complex data tracking innovation and output of the doctoral population. The results also identify potential quality issues for external sources.  

Co-Author(s)

Lynn Milan, National Center for Science and Engineering Statistics, NSF
Flora Lan, National Center for Science and Engineering Statistics, NSF
Kelly Phou, National Center for Science and Engineering Statistics, NSF

Speaker

Wan-Ying Chang, National Center for Science and Engineering Statistics, NSF