Recent Advances in Statistical Methods for Integrative Analyses of Healthcare Data

Subharup Guha Chair
University of Florida
 
Subharup Guha Organizer
University of Florida
 
Sunday, Aug 3: 4:00 PM - 5:50 PM
0352 
Invited Paper Session 
Music City Center 
Room: CC-201A 

Applied

Yes

Main Sponsor

ENAR

Co Sponsors

Biometrics Section
International Chinese Statistical Association

Presentations

Causal Meta-Analysis by Integrating Multiple Observational Studies

Integrating multiple observational studies to make unconfounded causal or descriptive comparisons of group potential outcomes in a large natural population is challenging. Moreover, retrospective cohorts, being convenience samples, are usually unrepresentative of the natural population of interest and have groups with unbalanced covariates. We propose a general covariate-balancing framework based on pseudo-populations that extends established weighting methods to the meta-analysis of multiple retrospective cohorts with multiple groups. Additionally, by maximizing the effective sample sizes of the cohorts, we propose a FLEXible, Optimized, and Realistic (FLEXOR) weighting method appropriate for integrative analyses. We develop new weighted estimators for unconfounded inferences on wide-ranging population-level features and estimands relevant to group comparisons of quantitative, categorical, or multivariate outcomes. Asymptotic properties of these estimators are examined. Through simulation studies and meta-analyses of TCGA datasets, we demonstrate the versatility and reliability of the proposed weighting strategy, especially for the FLEXOR pseudo-population. An R package has been developed for disseminating the methods.  

Keywords

causal inference

integrative analysis

observational studies

data integration 

Speaker

Yi Li, University of Michigan

Allele-frequency estimation and ancestry informative marker identification via retrospective regression

Allele frequency estimation at a genetic marker plays a pivotal role in genetic studies. The accuracy of allele frequency estimation impacts the accuracy and power of a genome-wide association study (GWAS). Moreover, allele frequency may differ between seemingly similar populations, which makes allele frequency estimation particularly important for identifying ancestral informative markers (AIMs). Yet, existing allele frequency estimation methods mostly rely on independent sample from a homogeneous population and cannot provide closed form solutions for the maximum likelihood estimator (MLE) of the allele frequencies. To address these challenges, we propose a retrospective regression framework that takes genotype as the response variable, and population and other covariates as the dependent variable. The regression nature of our proposed method enables it to estimate allele frequency in heterogeneous populations and accommodate sample correlation. We support our analytical findings using the 1000 Genome Project genotype data of five super-populations.  

Speaker

Lin Zhang, Simon Fraser University

Joint modeling of biomarkers, treatment initiation and disease progression with incomplete data

Study and understanding of disease processes based on registry data or electronical medical records is challenging because disease process evolve in continuous-time but individuals are only seen upon encounters with the healthcare system. We consider issues in the study of chronic diseases where a dynamic marker is both associated with disease progression and the prescription of treatment but information is only available at intermittent clinic visits that may also be driven by the marker process. Through this joint model we identify different facets of the confounding that arise from some standard and more involved analyses and discuss identifiability issues when aiming to fit comprehensive models. Remarks on causal analyses for both the potential outcomes and Granger schools are also made. This is joint work with Richard Cook and Jerry Lawless. 

Keywords

dynamic biomarker

time-dependent confounding

intermittent observation

multistate model

intensity functions

estimands 

Co-Author

Richard Cook, University of Waterloo

Speaker

Lily Zou

Methods for using Multi-Health Plan EHR data to conduct Active Safety Surveillance Studies of New Medical Products

Conducting observational postmarket medical product safety surveillance is important for detecting rare adverse events not identified pre-licensure. New systems (e.g. CDC Vaccine Safety Datalink and FDA Sentinel Initiative) have been built for the conduct of safety surveillance using electronic healthcare data (claims and electronic medical records) from multiple healthcare systems that keeps the individual patient data within the health plan and establishes a distributed data network to share deidentified or limited data to answer important safety questions about new medical products. I will present several approaches our team have developed tailored to these networks that control for confounding, appropriate for rare events, and work within a distributed data network to protect patient privacy. I will focus the presentation on acute adverse events yielding binary outcomes, but discuss extensions of this work to survival outcomes. I will show results from a simulation study and results of the application to a real vaccine safety study.  

Keywords

Distributed Data

Safety Surveillance 

Speaker

Andrea Cook, Kaiser Permanente Washington Health Research Institute