Tree-Based Machine Learning Methods for Prediction, Variable Selection and Causal Inference

Abstract Number:

1503 

Submission Type:

Professional Development Course/CE  

Participants:

Hemant Ishwaran (1), Min Lu (2)

Institutions:

(1) N/A, N/A, (2) University of Miami, N/A

Co-Instructor:

Min Lu  
University of Miami

Primary Instructor:

Hemant Ishwaran  
N/A

Description:

Tree-based machine learning methods offer several benefits in data analysis, including non-linearity, robustness, scalability and handling mixed data types. This course emphasizes practical learning with hands-on code examples and result interpretations, which is essential for understanding and applying these techniques. Based on the widely popular R package "randomForesSRC", we will present methods for computing predicted outcomes, variable importance indices and causal inference estimates. In addition, we will introduce a new model-independent variable selection method, called the rule-based variable priority, and present its implementation using the R package "varPro". For all these analyses, we will cover different types of outcomes including continuous, categorical, multivariate, survival and competing risk outcomes. Utilizing exemplary datasets from papers published in medical and public health journals, topics in these analyses will provide hands-on code, working examples and result interpretations. We will provide additional code for visualizing model results and constructing coefficient tables for interpretation, and address scenarios such as imbalanced classes, unsupervised problems, fast implementation on big data and protection of confidential data.

Instructor Background:

Dr. Ishwaran's research is in the area of machine learning and in particular has focused on developing theory and methods for applying the machine learning method random forests to problems in cardiovascular disease and cancer outcome research. Dr. Lu's research area is developing and integrating machine learning Random Forest approaches for causal inference and variable selection. Dr. Ishwaran and Dr. Lu are dedicated in distributing the widely popular R package "randomForestSRC" with 16 online vignettes, and works together for their new method and software "varPro" for rule-based model independent variable selection.

Course Outline:

Segment 1
Length: 40 minutes
Target audience: analysts who are not familiar with random forest methods
Content: Slide presentation of random forest methods and the "randomForesSRC" package

Segment 2
Length: 40 minutes
Target audience: analysts who need hands-on coding for conducting analysis.
Content: R program presentation of the "randomForesSRC" package

Segment 3
Length: 40 minutes
Target audience: analysts who are interested in advanced methods for variable selection.
Content: Slide presentation of introducing a model-independent variable selection method and R program presentation of the "varPro" package

Segment 4
Length: 40 minutes
Target audience: analysts who are interested in machine learning methods for causal inference.
Content: Slide presentation of causal inference and individual treatment effect estimation, and R program presentation of the random forest based causal inference analysis with result interpretation

Learning Outcomes:

(a) Learning Outcomes. (1) Attendees are expected to conduct a statistical analysis with the introduced models for continuous, categorical and survival outcomes using R packages. (2) Besides prediction, variable selection and causal inference results can be visualized, understood and interpreted. (3) Attendees are expected to demonstrate an attitude that tree-based machine learning methods are quite approachable and flexible.

(b) Content and instructional methods. (1) We will present and teach the usage of two R packages, "randomForesSRC" and "varPro", for conducting statistical analyses. (2) We will provide additional code for visualizing model results and constructing coefficient tables based on exemplary papers published in medical and public health journals. (3) We will add Q&A question section and prepare extra datasets and problems that can be flexibly chosen to demonstrate. Attendees will have a better understanding of problems that are beyond the scope of the course.

Sponsors:

Biometrics Section 1
Health Policy Statistics Section 2
Section for Statistical Programmers and Analysts 3

Do you need additional equipment for your course?

No

Length of Course (pick 1)

Half Day Course