Print Close

Estimation and Inference in Cluster Randomized Trials with Few Large Clusters for Binary Outcomes

Presented During: From Compositional Microbiome Data to Longitudinal Biomarker Analysis: Cutting-Edge Statistical Methods

Donna Spiegelman Co-Author
Yale School of Public Health

Fan Li Co-Author
Yale School of Public Health

Zachary Frere First Author

Zachary Frere Presenting Author

Thursday, Aug 7: 9:20 AM - 9:35 AM
2767
Contributed Papers

Music City Center

Cluster randomized trials (CRTs) are essential for evaluating cluster-level interventions in medicine and public health. However, many CRTs include only a few clusters, such as hospital-based interventions where a small number of large hospitals are randomized. Conventional methods often require at least 30–40 clusters for reliable inference. This study uses simulations to explore statistical methods for CRTs with binary outcomes when there are ≤10 clusters with large sizes. We investigate whether asymptotic properties hold in this challenging yet common scenario.
We compare generalized estimating equations (GEE), generalized linear mixed models (GLMM), cluster-level summaries (CLS), and randomization-based methods (RB). Simulations show that GLMM and CLS performed best for Type 1 error and power. RB maintained Type 1 error but lagged in power compared to CLS and GLMM. GEE had the worst Type 1 error, with the standard sandwich variance estimator inflating Type 1 error, while bias-corrected versions tended to underestimate it. These findings can better guide the choice of analytic methods for CRTs with few but large clusters, ensuring more robust inference in real-world settings

Keywords

Cluster Randomized Trials

Multilevel Models

Type I Error

Simulation Study

Few Clusters

Inference

Main Sponsor

ENAR