Print Close

Clinical Variables of Different Types: An Approach to Model-Based Clustering of Mixed-Type Data Illustrated in Diet Diaries

Presented During: Better clustering leads to better understanding: genes, microbes, or foods show risk subgroups

Tanzy Love Speaker
University of Rochester

Thursday, Aug 7: 9:50 AM - 10:15 AM
Invited Paper Session

Music City Center

Our new framework for model-based clustering on data with continuous and discrete variables extends the cluster variance structure
framework for Gaussian mixture models set forth by Fraley and Raftery (1999). In modeling how each variable contributes to cluster determination, we allow for relationships within and between the continuous and discrete variables. This avoids both the creation
of latent continuous variables for unordered categories and the simplifying assumption that categorical variables are completely independent of all other clustering variables. Simulation study results showed desirable properties of our method when applied
to data with variables of mixed-distributional forms. Applying our clustering methods to prostate cancer data shows subgroups with different responses to treatment. Applying our data to nutritional intake data shows similar clusters for mothers and children
based on independent data collection.

Keywords

mixture models

categorical data