Sunday, Aug 3: 4:00 PM - 5:50 PM
0352
Invited Paper Session
Music City Center
Room: CC-201A
Applied
Yes
Main Sponsor
ENAR
Co Sponsors
Biometrics Section
International Chinese Statistical Association
Presentations
Integrating multiple observational studies to make unconfounded causal or descriptive comparisons of group potential outcomes in a large natural population is challenging. Moreover, retrospective cohorts, being convenience samples, are usually unrepresentative of the natural population of interest and have groups with unbalanced covariates. We propose a general covariate-balancing framework based on pseudo-populations that extends established weighting methods to the meta-analysis of multiple retrospective cohorts with multiple groups. Additionally, by maximizing the effective sample sizes of the cohorts, we propose a FLEXible, Optimized, and Realistic (FLEXOR) weighting method appropriate for integrative analyses. We develop new weighted estimators for unconfounded inferences on wide-ranging population-level features and estimands relevant to group comparisons of quantitative, categorical, or multivariate outcomes. Asymptotic properties of these estimators are examined. Through simulation studies and meta-analyses of TCGA datasets, we demonstrate the versatility and reliability of the proposed weighting strategy, especially for the FLEXOR pseudo-population. An R package has been developed for disseminating the methods.
Keywords
causal inference
integrative analysis
observational studies
data integration
Speaker
Yi Li, University of Michigan
Allele frequency estimation at a genetic marker plays a pivotal role in genetic studies. The accuracy of allele frequency estimation impacts the accuracy and power of a genome-wide association study (GWAS). Moreover, allele frequency may differ between seemingly similar populations, which makes allele frequency estimation particularly important for identifying ancestral informative markers (AIMs). Yet, existing allele frequency estimation methods mostly rely on independent sample from a homogeneous population and cannot provide closed form solutions for the maximum likelihood estimator (MLE) of the allele frequencies. To address these challenges, we propose a retrospective regression framework that takes genotype as the response variable, and population and other covariates as the dependent variable. The regression nature of our proposed method enables it to estimate allele frequency in heterogeneous populations and accommodate sample correlation. We support our analytical findings using the 1000 Genome Project genotype data of five super-populations.
Study and understanding of disease processes based on registry data or electronical medical records is challenging because disease process evolve in continuous-time but individuals are only seen upon encounters with the healthcare system. We consider issues in the study of chronic diseases where a dynamic marker is both associated with disease progression and the prescription of treatment but information is only available at intermittent clinic visits that may also be driven by the marker process. Through this joint model we identify different facets of the confounding that arise from some standard and more involved analyses and discuss identifiability issues when aiming to fit comprehensive models. Remarks on causal analyses for both the potential outcomes and Granger schools are also made. This is joint work with Richard Cook and Jerry Lawless.
Keywords
dynamic biomarker
time-dependent confounding
intermittent observation
multistate model
intensity functions
estimands
Conducting observational postmarket medical product safety surveillance is important for detecting rare adverse events not identified pre-licensure. New systems (e.g. CDC Vaccine Safety Datalink and FDA Sentinel Initiative) have been built for the conduct of safety surveillance using electronic healthcare data (claims and electronic medical records) from multiple healthcare systems that keeps the individual patient data within the health plan and establishes a distributed data network to share deidentified or limited data to answer important safety questions about new medical products. I will present several approaches our team have developed tailored to these networks that control for confounding, appropriate for rare events, and work within a distributed data network to protect patient privacy. I will focus the presentation on acute adverse events yielding binary outcomes, but discuss extensions of this work to survival outcomes. I will show results from a simulation study and results of the application to a real vaccine safety study.
Keywords
Distributed Data
Safety Surveillance
Speaker
Andrea Cook, Kaiser Permanente Washington Health Research Institute