Improving Sexual Identity Measures in Health Disparity Studies with Machine Learning and Resampling
Brady West
Co-Author
Institute for Social Research
Wednesday, Aug 7: 11:20 AM - 11:35 AM
2296
Contributed Papers
Oregon Convention Center
Survey research on sexual identity often categorizes respondents as heterosexual, homosexual, and bisexual, but may miss nuanced identities. Prior work has shown that introducing a "something else" response option can affect health disparity estimates. However, many surveys lack this option. We propose a machine learning approach to infer "something else" responses in existing surveys without this option. Leveraging a split-ballot experiment in the 2015-2019 National Survey of Family Growth, we use the half-sample including "something else" as a training dataset and a set of supervised machine learning algorithms to develop a classifier for sexual identity. We then use the half-sample excluding "something else" as a test dataset, predicting responses on the four-category version of sexual identity and computing revised estimates of disparities based on these new predictions. We repeat this process using bootstrap resampling to generate an empirical distribution of revised disparity estimates, comparing the estimates to those based on the original half-sample used for training. We conclude with implications of this work for future surveys measuring sexual identity.
Sexual Identity Measurement
Machine Learning
Health Disparity Estimates
Survey Research
National Survey of Family Growth (NSFG)
Bootstrap Resampling
Main Sponsor
Survey Research Methods Section
You have unsaved changes.