Biostatistical Methods for Correlated Data

Zhuoran Wei Chair
Harvard T.H. Chan School of Public Health
 
Wednesday, Aug 6: 10:30 AM - 12:20 PM
4154 
Contributed Papers 
Music City Center 
Room: CC-201B 

Main Sponsor

Section on Statistics in Epidemiology

Presentations

A Comparison of Testing Procedures for Local and Long Term Effects to Screen for Cancer Biomarkers

We compare three common approaches to identify longitudinal biomarkers associated with survival outcomes: joint models, conditional models, and time dependent Cox models. For cancer biomarkers, associations can be acute, meaning longitudinal trajectory may change sharply just before diagnosis or have more long-term associations for risk estimation, such as differences in levels or slopes. Each of the three methods uses a different modeling framework for the joint density of the biomarkers and survival time and thus has different advantages and disadvantages for detecting local and long-term associations. The current project investigates the three approaches' power and type I error under different data generation schemes to motivate further methods development for longitudinal biomarker screening in cancer studies. We found that the conditional model can effectively disentangle the acute and long term effects. We also see the standard joint model with random intercept and slope does not identify acute effects well, but has slightly higher power than the Cox model for long term effects. 

Keywords

Joint Model

Censored Covariate

Time Dependent Cox Model

Longitudinal Biomarkers 

Co-Author(s)

Paul Albert, National Cancer Institute
Anindya Roy, University of Maryland-Baltimore County
Danping Liu, National Institutes of Health

First Author

Siddharth Roy, National Cancer Institute

Presenting Author

Siddharth Roy, National Cancer Institute

A marginal regression model for longitudinal compositional count with application to microbiome data

Microbiome data from sequencing experiments contain compositional counts of various microbial taxa that exhibit varying levels of zero inflation and overdispersion. We first propose a distribution named adaptively zero-inflated generalized Dirichlet multinomial (AIGDM) that uses GDM to model the relative abundance of the present taxa and the zero-inflation part to model taxa absence when needed. We introduce a likelihood-ratio test to determine the necessity of having the zero-inflation part for each taxon. We then develop an AIGDM-based marginal regression model for longitudinal microbiome compositional counts. The model combines the ability of AIGDM to flexibly model microbial compositions and the ability of the generalized estimating equation method (GEE) to handle correlations between the repeated measures. Under the model, we propose association tests for mean, dispersion, and absence-presence proportion parameters to characterize what aspect of the microbial composition distribution is disrupted by the exposure in a longitudinal study. We also propose an omnibus test by combining these tests to achieve overall power and robustness. 

Keywords

Association test

Compositional data

Generalized Dirichlet multinomial

Sequence count data

Zero inflation 

Co-Author

Zhengzheng Tang, University of Wisconsin-Madison

First Author

Qilin Hong

Presenting Author

Qilin Hong

Association of ordinal traits and genetic variants in pedigree-structured samples by kernel method

In genome-wide association studies (GWAS), logistic regression is one of the most popular analytics methods for binary traits. However, many GWAS methods have been limited application to binary traits. These methods have improperly often been used to account for ordinal traits, which may cause inappropriate analysis results. In this investigation, we develop a framework for the association analysis of the ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods. We use the local odds ratios GEE technology to describe the complicated correlation structures between family members and ordered categorical traits. We use the retrospective idea to treat the genetic markers as random variables for calculating genetic correlations among markers. The proposed genetic association method can accommodate ordinal traits and allow for the covariate adjustment. We conduct simulation studies to compare the proposed tests with the existing models for analyzing the ordered categorical data. We illustrate application of the proposed tests by analyzing the publicly available dataset. 

Keywords

ordinal traits

pedigree-based study

genetic variants

kernel statistic 

First Author

Li-Chu Chien, Kaohsiung Medical University, Taiwan

Presenting Author

Li-Chu Chien, Kaohsiung Medical University, Taiwan

Comparative effects of generalized time-varying treatment strategies with repeated outcomes

We consider the problem of estimating comparative effects of adhering to certain medication strategies on future weight gain based on electronic health records data. This problem presents several methodological challenges. First, this setting involves time-varying treatment strategies with treatment-confounder feedback. Second, the treatment strategies involve dynamic and non-deterministic elements, including grace periods. Third, the outcome is repeatedly measured (e.g., at each follow-up interval) with substantial missingness that follows a nonmonotonic pattern. Fourth, individuals may die during follow-up, in which case weight gain is undefined after death. In this talk, we describe approaches to estimate comparative effects that address the aforementioned challenges in our setting, which we refer to as time-smoothed inverse probability weighted (IPW) approaches. We conducted simulation studies that illustrate efficiency gains of the time-smoothed IPW approach over a more conventional IPW approach that does not leverage the repeated outcome measurements. We then applied the time-smoothed IPW approaches to estimate effects of adhering to medication strategies on weight gain. 

Keywords

causal inference

electronic health records data

generalized treatment strategies

repeatedly measured outcomes

inverse probability weighting 

Co-Author(s)

Jason Block, Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute
Jessica Young, Harvard Medical School/Harvard Pilgrim HealthCare Institute

First Author

Sean McGrath, Harvard Medical School and Harvard Pilgrim Health Care Institute

Presenting Author

Sean McGrath, Harvard Medical School and Harvard Pilgrim Health Care Institute

Modeling Bivariate Survival with Dependent Censoring Using Copulas

Independent censoring is a key assumption usually made when analyzing time-to-event data. However, this assumption is untestable and can be problematic, particularly in studies with disproportionate loss to follow-up due to adverse events. This paper addresses the challenges associated with dependent censoring by introducing a likelihood-based approach for analyzing bivariate survival data under dependent censoring. A flexible Joe-Hu copula is used to formulate the interdependence within the quadruple times (two events
and two censoring times). The marginal distribution of each event or censoring time is modeled via the Cox proportional hazards model. Our estimator possesses consistency and desirable asymptotic properties under regularity conditions. We provide results under extensive simulations with application to prostate cancer data. 

Keywords

Archimedean copula

Bivariate Survival

Dependent Censoring

Joe-Hu copula

Joint survival

Prostate Cancer Survival 

Co-Author

Yinghao Pan, University of North Carolina at Charlotte

First Author

Reuben Adatorwovor, University of Kentucky

Presenting Author

Reuben Adatorwovor, University of Kentucky

Multivariate one-sided testing via sample splitting in matched observational studies

When assessing the causal effect of a treatment on two or more outcomes in an observational study, a linear combination of outcomes may lessen the sensitivity of a test of the global null hypothesis to potential unmeasured biases. While all linear combinations of scored outcomes can be considered using ScheffĂ© projections, finding the contrast that minimizes sensitivity to unmeasured biases requires corrections for multiple testing which can erode power, especially when many outcomes are of interest. To mitigate this issue, we propose splitting the sample into a planning sample to identify the optimal contrast and an analysis sample to conduct inference. We introduce a novel minimax theorem for this problem and find that the design sensitivity on the whole sample equals the design sensitivity when using split samples. We also conduct extensive simulation studies demonstrating enhanced power in finite samples. Finally, we apply our method to investigate the broad effects of low family income on children's physical activity and fitness. 

Keywords

sensitivity analysis

multiple hypothesis testing

unmeasured confounding 

Co-Author(s)

Dylan Small, University of Pennsylvania
Colin Fogarty, University of Michigan

First Author

William Bekerman, University of Pennsylvania

Presenting Author

William Bekerman, University of Pennsylvania