10/07/2022: 3:30 PM - 4:00 PM CDT
Concurrent
Room: Grand Ballroom Salon F
Very few methods are available that show the variability of variable importance specifically within methods such as random forest. Confidence intervals are extensively used in statistics and may be understood even by introductory level individuals especially when shown graphically. For this proposed method, a random forest model may be created per usual, then using the variable importance from each tree in the forest, bootstrapping is implemented to create confidence intervals for each variable's importance. These confidence intervals may be compared to current methods by Ishwaran and Lu (2018) with examples shown in R to understand the variables and the interpretations of variables' importance. For example, if confidence intervals for variable importance overlap between two predictors, the predictor ranked higher by the mean variable importance may not necessarily be more important than the predictor its confidence interval overlaps with. Thus, these confidence intervals allow for additional interpretations and understanding of the predictors involved in the model which is a common goal for an analysis of a dataset with random forest.
Random forest
Bootstrapping
Confidence intervals
Variable importance
Machine learning
Presenting Author
Heather Cook
First Author
Heather Cook
CoAuthor(s)
Daniel Keenan, University of Virginia
Douglas Lake, Univ of Virginia
Target Audience
Mid-Level
Tracks
Knowledge
Women in Statistics and Data Science 2022