Variable Importance Confidence Intervals within Random Forest

Conference: Women in Statistics and Data Science 2022
10/07/2022: 3:30 PM - 4:00 PM CDT
Concurrent 
Room: Grand Ballroom Salon F 

Description

Very few methods are available that show the variability of variable importance specifically within methods such as random forest. Confidence intervals are extensively used in statistics and may be understood even by introductory level individuals especially when shown graphically. For this proposed method, a random forest model may be created per usual, then using the variable importance from each tree in the forest, bootstrapping is implemented to create confidence intervals for each variable's importance. These confidence intervals may be compared to current methods by Ishwaran and Lu (2018) with examples shown in R to understand the variables and the interpretations of variables' importance. For example, if confidence intervals for variable importance overlap between two predictors, the predictor ranked higher by the mean variable importance may not necessarily be more important than the predictor its confidence interval overlaps with. Thus, these confidence intervals allow for additional interpretations and understanding of the predictors involved in the model which is a common goal for an analysis of a dataset with random forest.

Keywords

Random forest

Bootstrapping

Confidence intervals

Variable importance

Machine learning 

Presenting Author

Heather Cook

First Author

Heather Cook

CoAuthor(s)

Daniel Keenan, University of Virginia
Douglas Lake, Univ of Virginia

Target Audience

Mid-Level

Tracks

Knowledge
Women in Statistics and Data Science 2022