Multiple Imputation, Machine-Learning, and Hot Deck Imputation Models with the 2021 NSDUH survey

Abstract Number:

3865 

Submission Type:

Contributed Abstract 

Contributed Abstract Type:

Paper 

Participants:

Mark Brow (1), Jingsheng Yan (2)

Institutions:

(1) Department of Health and Human Services/SAMHSA, N/A, (2) Department of Health and Human Services/SAMHSA, Rockville, MD

Co-Author:

Jingsheng Yan  
Department of Health and Human Services/SAMHSA

First Author:

Mark Brow  
Department of Health and Human Services/SAMHSA

Presenting Author:

Mark Brow  
Department of Health and Human Services/SAMHSA

Abstract Text:

This study will evaluate several imputation strategies and a novel natural language processing (NLP) deep neural network algorithm vis-à-vis hot deck imputation strategies and complete case analysis on artificially created missing-at-random (MAR) 2021 NSDUH survey data. Missing rates are 1.43 %, 9 %, and 16%. Evaluation metrics include empirical bias (EBias), root mean square error (RMSE), percent coverage, and percentage of correct prediction (PCP). Survey weighted and non-survey weighted hot deck imputation methods in SAS and a weighted sequential hot deck method (WSHD) in SUDAAN were used, in addition to a multiple imputation by chained equations model, a multiple imputation classification and regression tree model (CART) and gradient boosted trees model (xgboost) in R. A novel approach using Google's search inquiry algorithm B.E.R.T. involved converting numeric values to data labels to predict the true value. Results: the BERT model had highest PCP for all three missing rates, the WSDH performed well at 1.43% missing, and the CART model at 16% missing. This study examines optimal imputation strategies for complex survey data and explores use of NLP for imputation.

Keywords:

Machine learning|NSDUH 2021|Imputation|Natural Language Processing (NLP)| |

Can this be considered for alternate subtype?

Yes

Are you interested in volunteering to serve as a session chair?

No

I have read and understand that JSM participants must abide by the Participant Guidelines.

Yes

I understand that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is non-refundable.

I understand