Random Forest Clustering for Development of Clinical Phenotypes from Cohort Studies

Michael LaValley Speaker
Boston University
 
Wednesday, Aug 6: 3:25 PM - 3:45 PM
Invited Paper Session 
Music City Center 
Many medical diagnoses represent heterogeneous conditions that combine a number of subtypes before clinical presentation. Clustering analyses of patients with such diagnoses may reveal these underlying subtypes and help in the development of more homogeneous clinical phenotypes which can be targeted by more specific treatments to prevent disease progression. We present a nonparametric machine learning approach to clustering patients based on the Random Forest algorithm which accommodates the mixed variable types and skewness of standard medical data. To illustrate the approach we use cohort data from the Multicenter Osteoarthritis Study and from the similarly-designed Osteoarthritis Initiative Study to evaluate subtypes of patients undergoing knee replacement surgery and compare the cluster results to those obtained by the k-means clustering algorithm. We find the Random Forest approach to produce clusters with greater interpretability and with less impact from the study design features than the k-means algorithm.

Keywords

Unsupervised Learning

Classification Trees

Biomedical Data

Osteoarthritis