Multiple Imputation, Machine-Learning, and Hot Deck Imputation Models with the 2021 NSDUH survey
Abstract Number:
3865
Submission Type:
Contributed Abstract
Contributed Abstract Type:
Paper
Participants:
Mark Brow (1), Jingsheng Yan (2)
Institutions:
(1) Department of Health and Human Services/SAMHSA, N/A, (2) Department of Health and Human Services/SAMHSA, Rockville, MD
Co-Author:
First Author:
Mark Brow
Department of Health and Human Services/SAMHSA
Presenting Author:
Mark Brow
Department of Health and Human Services/SAMHSA
Abstract Text:
This study will evaluate several imputation strategies and a novel natural language processing (NLP) deep neural network algorithm vis-à-vis hot deck imputation strategies and complete case analysis on artificially created missing-at-random (MAR) 2021 NSDUH survey data. Missing rates are 1.43 %, 9 %, and 16%. Evaluation metrics include empirical bias (EBias), root mean square error (RMSE), percent coverage, and percentage of correct prediction (PCP). Survey weighted and non-survey weighted hot deck imputation methods in SAS and a weighted sequential hot deck method (WSHD) in SUDAAN were used, in addition to a multiple imputation by chained equations model, a multiple imputation classification and regression tree model (CART) and gradient boosted trees model (xgboost) in R. A novel approach using Google's search inquiry algorithm B.E.R.T. involved converting numeric values to data labels to predict the true value. Results: the BERT model had highest PCP for all three missing rates, the WSDH performed well at 1.43% missing, and the CART model at 16% missing. This study examines optimal imputation strategies for complex survey data and explores use of NLP for imputation.
Keywords:
Machine learning|NSDUH 2021|Imputation|Natural Language Processing (NLP)| |
Can this be considered for alternate subtype?
Yes
Are you interested in volunteering to serve as a session chair?
No
I have read and understand that JSM participants must abide by the Participant Guidelines.
Yes
I understand that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is non-refundable.
I understand
You have unsaved changes.