Application of Machine Learning Models to Blood Metal Exposures in the NHANES Data

Jeffery Jarrett Co-Author
CDC
 
Cynthia Ward Co-Author
CDC
 
Liza Valentin-Blasini Co-Author
CDC
 
Po-Yung Cheng First Author
CDC
 
Po-Yung Cheng Presenting Author
CDC
 
Tuesday, Aug 6: 8:35 AM - 8:40 AM
1964 
Contributed Speed 
Oregon Convention Center 
Identifying high exposure levels of blood metals in humans is important because medical interventions or recommendations can be provided to reduce and prevent future exposures. We aimed to use machine learning to develop identification models. Five machine learning models (Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest (RF)) were applied to NHANES 2015-2016 blood metal data. For blood cadmium (BCd) and lead (BPb) exposures, sex, poverty income ratio (PIR), race, age group, and cotinine level were used as attributes for the models while for total mercury (THg) exposure we used sex, PIR, race, age group, and shellfish-eaten. Blood metals concentrations greater than or equal to the 75th percentile was considered as "higher exposure." The following metrics: accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were used to evaluate the performance of the models. The KNN model had the best performance in terms of predicting BCd and THg exposures while the LDA model was best for predicting BPb exposure.

Keywords

machine learning

metal exposure

NHANES

lead

cadmium

mercury 

Main Sponsor

Section on Statistics and the Environment