Wednesday, Aug 7: 10:30 AM - 12:20 PM
5158
Contributed Papers
Oregon Convention Center
Room: CC-E148
Main Sponsor
Survey Research Methods Section
Presentations
Interviewer-administered surveys can suffer from quality issues when question wording differs from the questionnaire. Manual review is required to identify discrepancies and ensure survey quality. RTI QUINTET, a machine learning tool suite, automates quality checks by comparing AI-automated transcripts to the questionnaire. Discrepancies between interviewer administration and the questionnaire are identified, and potentially problematic cases are prioritized for human review. This enhances data quality by identifying re-training opportunities and problematic questionnaire items. We evaluated QUINTET on a telephone healthcare survey with 923 recorded interviews. We compared a random subset of 21 cases that were manually transcribed to transcripts generated by QUINTET, assuming the manual transcripts as ground truth. Preliminary results indicate 90% accuracy for QUINTET. We explore reasons for differences between human and automated transcripts, suggesting future improvements. We also transcribed all interviews to calculate similarity between transcripts to the questionnaire, manually validating low similarity cases. We conclude with discussion of implications for surveys.
Keywords
Machine Learning
CATI
CARI
Automated Transcription
Survey Administration
Automated Quality Control
I will present the results of an online experiment that evaluated the impact of three incentive payment plans on survey responses in online panels. This study involved 500 online panelists from the University of Michigan (U-M) Masters' student population. Over the course of 6 months starting in October 2023, they were asked to complete a 10-minute wellbeing survey every two and a half months, for a total of three waves. The participants were randomly assigned to one of three treatment groups: the control group received a $5.00 cash incentive for each completed survey, mirroring typical longitudinal study incentives. The two treated groups either received a $5.00 cash incentive one week before each survey wave or a one-time upfront lump sum of $15.00 unconditionally, irrespective of their actual survey participation. Initial results show that response rates among the treated groups are significantly higher than the control group across all survey waves. The treatment effects are robust to the inclusion of covariates. The talk will focus on the underlying theories and mechanisms that drive the results, and discuss their implications for longitudinal data collection in online panels.
Keywords
Longitudinal data collections
Probability-based online panels
Panel attrition
Survey nonresponse
Survey incentives
Student mental well-being
In survey methodology, general compliance with protocols and individual interviewer performance has been analyzed with audio recordings. This is a resource intensive task since audios listening must be performed. On the other hand, little work has been done in analyzing subjective probabilistic expectations questions. In economics, agents form expectations for unknown quantities to take decisions, and very often the research problem is to infer the subjective probability distributions that express such expectations. In this paper, we develop a state-of-the-art audio transcription and speaker diarization machine learning pipeline and apply it to audio recordings of a subjective probabilistic expectations question from the Spanish Survey of Household Finances. We first compare the variables from the pipeline with a question evaluation sheet completed by the survey team. Then, we evaluate the interviewer question reading behavior using novel natural language processing techniques. We find that the extracted audio features are useful for assessing compliance, interviewer performance and for detecting biased responses from interviewer-induced household probabilistic expectations.
Keywords
Machine Learning
Audio Transcription
Survey Methodology
Household Expectations
Survey research on sexual identity often categorizes respondents as heterosexual, homosexual, and bisexual, but may miss nuanced identities. Prior work has shown that introducing a "something else" response option can affect health disparity estimates. However, many surveys lack this option. We propose a machine learning approach to infer "something else" responses in existing surveys without this option. Leveraging a split-ballot experiment in the 2015-2019 National Survey of Family Growth, we use the half-sample including "something else" as a training dataset and a set of supervised machine learning algorithms to develop a classifier for sexual identity. We then use the half-sample excluding "something else" as a test dataset, predicting responses on the four-category version of sexual identity and computing revised estimates of disparities based on these new predictions. We repeat this process using bootstrap resampling to generate an empirical distribution of revised disparity estimates, comparing the estimates to those based on the original half-sample used for training. We conclude with implications of this work for future surveys measuring sexual identity.
Keywords
Sexual Identity Measurement
Machine Learning
Health Disparity Estimates
Survey Research
National Survey of Family Growth (NSFG)
Bootstrap Resampling
To examine the association of interviewer morale with field effort and efficiency, the National Health Interview Survey (NHIS) conducted an evaluation of a September to December of 2023 NHIS interviewer support initiative. NHIS, the nation's gold standard nationally representative household health survey, is conducted by the National Center for Health Statistics, with data collected by U.S. Census Bureau Field Representatives (FRs). After describing the initiative, which facilitated peer and supervisor encouragement and instrumental support for NHIS FRs in completing their NHIS cases, this paper presents the methods and results of the evaluation. This evaluation consisted of a 2024 NHIS FR survey on FR perspectives on the initiative's benefits, and an analysis of 2022-2024 NHIS paradata examining the difference in differences across years in the mean number of days to first contact; and mean number of in-person, phone, and total contact attempts to first contact, per case and per completed case. Differences in these measures between August 2023 and January 2024 (before and after the initiative), are compared with differences in these measures between August 2022 and January 2023.
Keywords
Interviewer support
Interviewer morale
CAPI survey
Field effort
Field efficiency
Co-Author(s)
Galila Haile, National Center for Health Statistics
Beth Taylor, NCHS(CDC)
Grace Medley, NCHS/ CDC
Maria Villarroel, NCHS(CDC)
Antonia Warren, NCHS(CDC)
Jonaki Bose, NCHS
Lindsay Howden, U.S. Census Bureau
Aaron Maitland, National Center for Health Statistics
Lillian Hoffmann, Census
James Dahlhamer, National Center for Health Statistics
First Author
Adena Galinsky
Presenting Author
Adena Galinsky
The integration of wearable sensor data in survey research has the potential to mitigate the recall and response errors that are typical in self-report data. However, such studies are often constrained in scale by implementation challenges and associated costs. This study used NHANES data, which includes both self-report responses and wearable sensor data measuring physical activity, to multiply impute sensor values for NHIS, a larger survey relying solely on interviews. Imputations were performed on synthetic populations to fully account for the complex sample design features.
Cross-validation demonstrated the robust predictive performance of the imputation model. The results showed disparities between sensor estimates and survey self-reports, and these discrepancies vary by different subgroups. Imputed estimates in NHIS closely mirrored the observed values in NHANES but tended to have higher standard errors. After the imputation, self-reports and sensor data in the combined dataset were used to predict health conditions as a means for evaluating data quality. Models with sensor values showed smaller deviance and higher coefficients of determination. The study advanced the existing literature on combining multiple data sources and provided insights into the use of sensor data in survey research.
Keywords
missing data imputation
wearable sensor data
self-report survey
data integration
NHANES
NHIS
Co-Author
Brady West, Institute for Social Research
First Author
Deji Suolang, University of Michigan - Ann Arbor
Presenting Author
Deji Suolang, University of Michigan - Ann Arbor
Multiple modes of contact can increase participation over using a single mode. Text messaging has emerged as a new contact mode; however, it's unclear how to best combine texting with mail and email contacts and what effects these strategies have on response and data quality. To explore the impact of text and email, we designed experiments that varied the number and sequencing of text and email contacts. These were implemented in two waves of the National Survey of Fishing, Hunting, and Wildlife-Associated Recreation, a nationally representative, longitudinal study.
We experimented with the use of text and email invitations and reminders, and the number of reminders sent by different modes. The first study compared text reminders early vs. later in the field period and the impact of a text invitation. The second study explored the use of text and email invitations and the use of multiple text reminders. We also explored the impact of email invitations based on whether the email was provided only for contact or for prior survey incentive payment. In the paper, we examine the effects of the experiments on completion rates, response time, sample representation, and item nonresponse.
Keywords
mixed-mode
contact strategies
text messaging
text reminders
response rates
text invitations