Print Close

Improving Sexual Identity Measures in Health Disparity Studies with Machine Learning and Resampling

Presented During: Innovations in Survey Methodology

Brady West Co-Author
Institute for Social Research

Rona Hu First Author

Rona Hu Presenting Author

Wednesday, Aug 7: 11:20 AM - 11:35 AM
2296
Contributed Papers

Oregon Convention Center

Survey research on sexual identity often categorizes respondents as heterosexual, homosexual, and bisexual, but may miss nuanced identities. Prior work has shown that introducing a "something else" response option can affect health disparity estimates. However, many surveys lack this option. We propose a machine learning approach to infer "something else" responses in existing surveys without this option. Leveraging a split-ballot experiment in the 2015-2019 National Survey of Family Growth, we use the half-sample including "something else" as a training dataset and a set of supervised machine learning algorithms to develop a classifier for sexual identity. We then use the half-sample excluding "something else" as a test dataset, predicting responses on the four-category version of sexual identity and computing revised estimates of disparities based on these new predictions. We repeat this process using bootstrap resampling to generate an empirical distribution of revised disparity estimates, comparing the estimates to those based on the original half-sample used for training. We conclude with implications of this work for future surveys measuring sexual identity.

Keywords

Sexual Identity Measurement

Machine Learning

Health Disparity Estimates

Survey Research

National Survey of Family Growth (NSFG)

Bootstrap Resampling

View Abstract 2296

Main Sponsor

Survey Research Methods Section