Wednesday, Aug 7: 8:30 AM - 10:20 AM
5130
Contributed Speed
Oregon Convention Center
Room: CC-E141
Presentations
We compare three common approaches to identify longitudinal biomarkers associated with survival outcomes: joint models, conditional models, and time dependent Cox models. For cancer biomarkers, associations can be acute, meaning longitudinal trajectory may change sharply just before diagnosis or have more long-term associations for risk estimation, such as differences in levels or slopes. Each of the three methods uses a different modeling framework for the joint density of the biomarkers and survival time and thus has different advantages and disadvantages for detecting local and long-term associations. The current project investigates the three approaches' power and type I error under different data generation schemes to motivate further methods development for longitudinal biomarker screening in cancer studies. We found that the conditional model can effectively disentangle the acute and long term effects. We also see the standard joint model with random intercept and slope does not identify acute effects well, but has slightly higher power than the Cox model for long term effects.
Keywords
Time Dependent Cox Model
Joint Model
Longitudinal Biomarker Screening
Early Detection
Risk Prediction
Conditional Model
Recent advances in spatially resolved transcriptomics (SRT) have illuminated gene co-expression networks in spatial contexts, offering insights into disease mechanisms. However, current methods, mainly designed for single-cell studies, tend to overlook the intricate interactions between spatial location and gene expression networks. None of them are able to handle the increasingly prevalent large-scale datasets. To address these limitations, we propose a novel matrix normal based method, spMGM, for inferring gene co-expression networks in SRT studies. spMGM accounts for intricate interactions between spatial context and gene expression. Through extensive simulations, both model-based and non-model based, spMGM accurately recovers the underlying gene co-expression network, improving accuracy by 40% - 50% compared to existing methods. Moreover, spMGM can efficiently handle large-scale datasets like 10x Xenium, with 10 times faster than the most advanced method. Applying spGMM to breast cancer tissue demonstrates its ability to detect breast cancer-related hub genes that have not been identified by the other methods.
Keywords
Gene Co-expression Network
Spatial Transcriptomics
Matrix Normal Model
In examining multiple time-dependent exposures in relation to time-to-event outcomes, the classical Cox regression model is limited in use due to its strong linearity assumption. While several Cox regression models have been developed to bypass this assumption, they overlook the temporal variations in the exposure mixture's impact on log hazard. To bridge this gap, we propose a novel Partial Linear Dynamic Single-Index Cox regression model. This model combines the time varying impact of exposure on the survival risk through an unknown nonparametric single-index function with the linear effects of additional covariates. We employed regression spline tensor basis to approximate the single-index function and propose a profile optimization algorithm to estimate the model. We also present LRT to compare our proposed model with the simple time-dependent cox model. After establishing the large sample properties for the proposed estimator, we evaluate its finite-sample performance under extensive simulation scenarios. We exemplify our model's application with the NYU CHES cohort.
Keywords
environmental exposure
time-dependent exposure
exposure mixture
cox regression
Microbiome research often conducts differential abundance analysis (DA) to identify microbial features associated with covariates of interest. Recently, concerns with false discoveries from DA have increased, and related statistical research usually attributes this to compositionality (microbial abundances are relative). In this work, we examine another potential cause: unobserved, microbiome-wide confounding (e.g., population structures, unmeasured technical effects). Such effects, often ignored during DA, have been noted to inflate false discovery rates (FDR) in molecular epidemiology, where research shows low-dimensional factor structures of the data can act as surrogates for confounding and be adjusted for to control FDR. We demonstrate systemic, real-data-based evidence that unobserved confounding consistently inflates FDR in microbiome DA. However, existing factor-based correction methods with simple modifications can effectively address this. We implement such methods with open-source software, to be conveniently integrated with existing DA. Our work is one of the first efforts to evaluate and correct for unobserved confounding to control FDR in microbiome DA.
Keywords
False discovery rate
Unobserved confounding
Microbiome
Differential abundance
Latent factor models
Co-Author(s)
Meghan Shilts, Vanderbilt University Medical Center
Zhouwen Liu, Vanderbilt University Medical Center
Tebeb Gebretsadik, Vanderbilt University, School of Medicine
Christian Rosas-Salazar, Vanderbilt University Medical Center
Suman Das, Vanderbilt University Medical Center
Tina Hartert, Vanderbilt University Medical Center
Chris McKennan, The University of Chicago
Yu Shyr, Vanderbilt University Medical Center
Siyuan Ma
First Author
Chih-Ting Yang, Vanderbilt University
Presenting Author
Chih-Ting Yang, Vanderbilt University
The cox regression model is widely applied in survival analysis. Time dependent covariates can be analyzed with cox regression model to solve the situation that the covariate changes over time during the follow-up period. While lacking the application this method using cardiovascular clinical trial data, this presentation illustrates the statistical methodology and the application of cox regression model with time dependent covariate. The results shows that post-discharge atrial fibrillation or flutter was a significant predictor of the composite endpoint of death, stroke or rehospitalization at 2 years irrespective the treatment (transcatheter heart valve replacement or surgery) among the low-risk patients with severe aortic stenosis.
Keywords
cox regression model
time dependent covariate
survival analysis
Users linked together through a network often tend to have similar behaviors. This phenomenon is usually known as network interaction. Users' covariates are often correlated with their outcomes. Therefore, one should incorporate both the covariates and the network information in a carefully designed randomization to improve the estimation of the average treatment effect (ATE) in network hypothesis testing. We propose a new adaptive procedure to balance both the network and the covariates. We show that the imbalance measures with respect to the covariates and the network are Op(1). We also demonstrate the relationships between the improved balances and the increased efficiency in terms of the mean square error (MSE). Numerical studies demonstrate the advanced performance of the proposed procedure regarding the greater comparability of the treatment groups and the reduction of MSE for estimating the ATE.
Keywords
Adaptive design
Covariate balance
Network balance
Treatment effect estimation
Martingale
Unraveling intricate relationships between diseases and genes poses challenges, demanding intuitive representation through smart visualization. Network analysis has gained prominence as a solution. While one-mode network analysis is common, it often falls short in identifying comprehensive information, such as gene-disease pairs or genes linked to the same environmental factors. In this study, we adopt multi-partite network analysis. A distinctive feature is the network's composition of mutually exclusive sets of nodes, with edges connecting nodes across different sets. Compressed relationships within sets are also explored through multi-level projections. We propose two types of projections for obtaining unipartite projections: sequential projection and concurrent projection. Applying this methodology to the Korean Association Resource (KARE) project, featuring 327,872 SNPs across 8,840 individuals, we considered three distinct datasets: genetic factors, environmental factors, and Metabolic Syndrome components. The resulting multi-partite network and projected lower mode network provided valuable insights into direct and indirect relationships.
Keywords
Multi-Partite Network
projection
genomic data
environmental factor
Identifying an optimal cutoff value of continuous variable associated with outcome variable is important for classifying high and low risk groups. This plays an important role in tailoring distinct treatment or procedure to each group. When outcome variable is the form of survival data with competing risk, methods for selecting cutoff value of continuous variable using test statistic have been proposed. Notably, even though concordance type measures have been commonly used to assess discriminatory power in survival data with competing risk, limited attention has been given to the determination of cutoff value for continuous variable using concordance type measure. This study proposed the methods for determining cutoff value of a continuous variable associated with survival outcome with competing risk using concordance type measures and evaluated the performances of the methods through simulation scenarios. These methods were also applied to real data to assess their practical utility. Simulation results indicated that the proposed methods tended to have good performance, even when the true cutoff value deviates from the center of the distribution of continuous variable.
Keywords
Cutoff value of continuous variable
Survival outcome with competing risk
Concordance type measure
Patients in anti-cancer clinical trials differentially experience toxicity to due to the treatment by race (Labriola & George, ASCO, 2021; Mizusawa, Cancer Med, 2023). We aim to better understand what the differences are in severity of toxicities (or adverse events) by estimating the difference network for severity of toxicities of Black race and white race samples during their time on immunotherapy clinical trials. The nature of the problem introduces several challenges including rare occurrences of adverse events, potential recurrence of adverse events, and adverse events' dependence on the type of treatments. We address these challenges by extending current graphical model frameworks to accommodate our specific data characteristics.
Keywords
probit graphical model
adverse events
Instrumental variable methods have been developed to estimate causal effects in observational studies in the presence of unmeasured confounding. However, existing approaches fall short in estimating average treatment effects (ATE) for time-to-event outcomes, often restricted to specific survival models and lacking desired statistical properties. In this study, we introduce a novel instrumental variable estimator of ATE in time-to-event outcomes, based on the cumulative incidence functions, accommodating scenarios with or without competing risks. Derived from efficient influence function, our estimator possesses double robustness and asymptotic efficiency, as theoretically proved and demonstrated via simulations. Our method enables the incorporation of various models for outcome, treatment and censoring, including machine learning and ensemble learning algorithms.
Keywords
Causal inference
Competing risks
Conditional average treatment effect
Efficient influence function
Survival Analysis
Targeted maximum likelihood estimation
Abstracts
Background: Efficient planning in clinical trials with time-to-event outcomes hinges on accurate timing predictions for achieving target event numbers. Traditionally, event prediction relies on simple survival models, overlooking the wealth of prognostic clinical markers. We propose a novel approach for event prediction by joint modeling the clinical markers and survival outcome.
Statistical Methods: The proposed methodology integrates the time-to-event outcome and patient-level potential prognostic longitudinal clinical outcomes. Leveraging on the fitted model, we conducted personalized prediction for each at-risk subject.
Results: The simulation studies established the superior predictive performance of the proposed method compared to benchmark model. Retrospective application in a randomized phase III oncology clinical trial underscored the model's accuracy, surpassing alternative benchmark models.
Conclusions: The proposed novel event prediction method advocates for the adoption of joint modeling as a robust strategy for event prediction. By harnessing the wealth of prognostic clinical markers, this approach improves prediction accuracy in clinical trials.
Keywords
Event Prediction
Joint Modelling
Survival Analysis
Clinical Trials
Bayesian Analysis
Oncology Clinical Trials
This study investigates the impact of unequal censoring on comparisons of survival distributions. We evaluated the performance of five statistical tests: the log-rank (LR), Gehan-Breslow generalized Wilcoxon (GB), Tarone-Ware (TW), Peto-Peto (PP), and modified Peto-Peto (mPP) tests. Using 1,000 simulations comparing two survival curves, we assessed their size and power under four censoring patterns: overall, early, middle, and late censoring (total of 16 combinations). Additionally, we explored different scenarios with sample sizes of 20 and 50 per group and varying levels of censoring (10% and 25%). Regardless of sample size, censoring percentage, censoring patterns, LR test demonstrated the highest power overall while GB showed the least power. For a sample size 20 per group, only early-overall censoring reached more than 80% power for LR and TW tests. For a sample size 50 per group, early-overall, early-early, early-middle, early-late censoring appeared to minimally impact all five individual tests. We further investigated the effect of combining their p-values into a single value. The combined p values using the generalized Fisher method had higher power than LR test; however, the type I error rates were not well maintained compared to LR test.
Keywords
Log-rank test
Gehan-Breslow generalized Wilcoxon test
Tarone-Ware test
Peto-Peto test
Modified Peto-Peto test
Survival analysis
Time-course multi-omics experiments are highly informative for obtaining a comprehensive understanding of the dynamic relationships between molecules. A fundamental step in analyzing such data involves selecting a short list of gene regions ("sites''). Two important criteria are the magnitude of change and the temporal dynamic consistency. Existing methods only consider one of them. We propose MINT-DE (Multi-omics INtegration of Time-course for Differential Expression analysis) that can select sites based on summarized measures of both aspects. We apply it to analyze a Drosophila development dataset and compare the results with existing methods. The analysis reveals that MINT-DE can identify differentially expressed time-course pairs with the highest correlations. Their corresponding genes are significantly enriched in gene-gene interaction networks and Gene Ontology terms. This suggests the effectiveness of MINT-DE in selecting sites that are both differentially expressed and temporally related across assays. This highlights the potential of MINT-DE to identify important sites and provide a complementarity of sites neglected by existing methods.
Keywords
Multi-omics
Edgington's method
Time-course
Translational regulation
Disease subtype discovery analysis using multi-source 'omics data in an integrative framework is a powerful approach. Such analyses leverage both between and within data correlations to identify latent subtype structure in the data. A new integrative similarity network-based clustering method is proposed using the non-negative matrix factorization, nNMF. The method utilizes the consensus matrices generated using the intNMF algorithm on each type of data as a network among the patient samples. The networks are then fused together to create a single comprehensive network structure optimizing the strengths of the relationships. A spectral clustering is then used on the final network data to determine the cluster groups. The method is illustrated with simulated, and real-life datasets obtained from The Cancer Genome Atlas studies on glioblastoma, lower grade glioma and head and neck cancer. nNMF works competitively with previous methods and sometimes better as compared to previous NMF or model-based methods. The novel nNMF method allows researchers to identify the latent subtype structure inherent in the data so that further association studies can be carried out.
Keywords
Integration
nNMF
Latent
Network
Spectral
In the context of HIV patients, over 20 distinct opportunistic infections (OIs) present complex effects on the health trajectory and associated mortality. It is crucial to differentiate among these OIs to devise tailored strategies to enhance patients' survival and quality of life. However, existing statistical frameworks for studying causal mechanisms have limitations, either focusing on single mediators or lacking the ability to handle unmeasured confounding, especially for the survival outcomes. In this work, we propose a novel joint modeling approach that considers multiple recurrent events as mediators and survival endpoints as outcomes, relaxing the assumption of "sequential ignorability" by utilizing the shared random effect to handle unmeasured confounders. We assume the multiple mediators are not causally related to each other given observed covariates and the shared frailty. Simulation studies demonstrate good finite sample performance of our methods in estimating both model parameters and multiple mediation effects. We apply our approach to an AIDS study and find that distinct pathways through the two treatments and CD4 counts impact overall survival via different OIs.
Keywords
Causal Inference
Joint Modeling
Mutiple Mediation Analysis
Recurrent Event
Survival Data
Microbial time-series data poses unique challenges, including intricate covariate dependencies and diverse longitudinal study designs. Existing methods for profiling, modeling, and visualizing microbiomics data often fall short in addressing these challenges due to their lack of versatility, data type specificity, or failure to account for the compositional nature of the data. In response, we introduce LegATo, an open-source suite comprising modeling, visualization, and statistical software tools tailored for the analysis of microbiome dynamics. LegATo offers a user-friendly interface, making it accessible for researchers dealing with various study structures. Particularly well-suited for longitudinal microbiomics and transcriptomics data, our package incorporates joint Generalized Estimating Equation (GEE) models specifically crafted to accommodate compositional data. This toolkit will allow researchers to determine which microbial taxa are affected over time by perturbations such as the onset of disease or lifestyle choices, and to predict the effects of these perturbations over time, including changes in composition or stability of commensal bacteria.
Keywords
generalized estimating equations
linear mixed models
longitudinal data analysis
metagenomics
microbiome
compositional data
In Oncology drug development, overall response rate (ORR), is commonly used as an early measure to assess the activity of a new drug. However, ORR, often is not very informative about longer-term clinical benefit depending upon specific indication and class of therapy. The existing endpoints in literature based on tumor growth dynamics (TGD) i.e., continuous longitudinal tumor size data provides better predictions for overall survival (OS) than ORR. But those have limitations such as requiring a minimum duration of follow-up for a patient to be included in the analysis, leading to biased results. We aim to address this gap by 1) proposing multiple TGD-based endpoints with an imputation mechanism so that all available data from all the patients can be utilized 2) developing a broad framework to simulate clinical trials with a variety of tumor growth and reduction rates, to perform unbiased comparison of proposed endpoints and ORR. Extensive simulations and validation indicate that some TGD endpoints dominate ORR consistently by having a comparable or higher correlation with long term OS than ORR. Thus, the TGD endpoints are recommended as an alternative to ORR across indications.
Keywords
Oncology
Overall response rate
Tumor growth dynamics
Overall survival
Co-Author(s)
Shubhadeep Chakraborty, Bristol Mayers Squibb
Izumi Hamada, Bristol Mayers Squibb
Kshitij Aggarwal, Bristol Mayers Squibb
Chuanpu Hu, Bristol Myers Squibb
Anna Kondic, Bristol Mayers Squibb
David Palucchi, Bristol Mayers Squibb
Arun Kumar, Bristol Mayers Squibb
Kaushal Mishra
Ram Tiwari, Bristol Myers Squibb
Mariann Micsinai Balan, Bristol Mayers Squibb
Kalyanee Viraswami-Appanna, Bristol Myers Squibb
First Author
Marzana Chowdhury
Presenting Author
Marzana Chowdhury
To address the need for novel interactive visualization tools and databases in characterizing multimorbidity patterns across different populations, we developed PheMIME and Phe-OmicsMIME. Integrating data from three large-scale EHR systems and genetic Biobanks: VUMC, MGB and UK Biobank, these tools perform statistical network analysis for efficient visualization and disease subtype analysis, uncovering robust and novel disease links that are interoperable across different systems to aid personalized medicine. PheMIME integrates phenome-wide analyses(PheWAS) summary statistics and incorporates an enhanced version of associationSubgraphs, enabling dynamic inference of disease clusters. Phe-OmicsMIME predict multi-omics traits from genetic Biobank and integrates hazard ratios from time-to-event PheWAS, connecting gene-protein-metabolite-disease relationships. It facilitates exploration of the biomolecule-disease bipartite network and provides compelling evidence of shared pathways among diseases. These tools stand out as the first of their kind to offer extensive disease subtype knowledge integration with substantial support for efficient online analysis and interactive visualization.
Keywords
interactive visualization
UK Biobank and Electronic Health Records (EHR)
network analysis and data science
interoperability and reproducibility
multimorbidity and PheWAS
personalized medicine
Co-Author(s)
Nick Strayer, Posit PBC
Tess Vessels, Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center
Dan Roden, Department of Pharmacology, Vanderbilt University Medical Center
Douglas Ruderfer, Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center
Yaomin Xu, Vanderbilt University Medical Center
First Author
Siwei Zhang, Vanderbilt University Medical Center
Presenting Author
Siwei Zhang, Vanderbilt University Medical Center
Microbiome data produce output based on the relative abundances of hundreds of taxon counts. Such data, generally referred to as high-dimensional compositional data, are inherently vulnerable to the independence assumption because of the sum-to-constant constraint. Particularly, feature selection with the control of false discovery (FDR) rate needs to be examined due to the assumptions of independent p-values for multiple testing. While log-ratio transformations may satisfy some assumptions, interpreting the transformed variables can be challenging in practice. For practical and useful variable selection, the inference should rely on the original scale with no transformation, naturally leading us to investigate the impact of compositional responses. The literature documents FDR-based inference under dependency; however, this method tends to be conservative, especially when assuming all null hypotheses are true. In this study, we aim to identify the weak dependency conditions under which the usual FDR procedure is effective with compositional responses in microbiome data. We provide guidelines when to modify the FDR procedure with dependency.
Keywords
high-dimensional compositional data
microbiome data
multiple testing
false discovery rate
First Author
Jung Ae Lee, University of Massachusetts Chan Medical School
Presenting Author
Jung Ae Lee, University of Massachusetts Chan Medical School
Kaplan-Meier curves and logrank tests are widely used to visualize and compare groups in survival analysis. To reduce the confounding effects due to unbalanced covariates, methods such as matching, weighting and stratification have been used. When comparing a target population to a reference population for the average treatment effect on the treated (ATT), we decided to weight the reference population instead of 1:1 matching based on selected covariates to avoid precision lost. We proposed the weighted version of KM curves and log rank test, which is shown to be a consistent estimate. Simulation is used to illustrate the performance in comparison with score test from cox proportional hazard model. The proposed method is also applied to compare an institutional cancer survival to the national benchmark from the Surveillance, Epidemiology, and End Results (SEER) database in Rshiny App.
.
Keywords
Weighted log-rank test
survival outcome
Kaplan-Meier curve
matching