05. Clustering for Data with Categorical Outcomes Using a Generalized Linear Mixed Effects Model with Simultaneous Variable Selection

Conference: Women in Statistics and Data Science 2024
10/16/2024: 4:00 PM - 5:00 PM EDT
Speed 

Description

I propose a model-based clustering method for high-dimensional, longitudinal data with categorical outcomes via regularization. The development of this method was motivated in part by a study on 177 Thai mother-child dyads to identify risk factors for early childhood caries (ECC). Another considerable motivation was a dental visit study of 308 pregnant women to ascertain determinants of successful dental appointment attendance. There is no available method capable of clustering longitudinal categorical outcomes while also selecting relevant variables. Within each cluster, a generalized linear mixed-effects model is fit with a convex penalty function imposed on the fixed effect parameters. Through the expectation-maximization algorithm, model coefficients are estimated using the Laplace approximation within the coordinate descent algorithm, and the estimated values are then used to cluster subjects via k-means clustering for longitudinal data. The Bayesian information criterion can be used to determine the optimal number of clusters and the tuning parameters through a grid search. My simulation studies demonstrate that this method has satisfactory performance and is able to accommodate high-dimensional, multi-level effects as well as identify longitudinal patterns in categorical outcomes.

Presenting Author

Samantha Manning, University of Rochester

First Author

Samantha Manning, University of Rochester

Target Audience

Mid-Level

Tracks

Knowledge
Women in Statistics and Data Science 2024