Recent Advances in Design and Analysis of Two-phase Studies

Li Cheung Chair
National Cancer Institute
 
Fangya Mao Organizer
 
Monday, Aug 5: 8:30 AM - 10:20 AM
1162 
Invited Paper Session 
Oregon Convention Center 
Room: CC-E144 

Applied

Yes

Main Sponsor

ENAR

Co Sponsors

International Society for Clinical Biostatistics

Presentations

A two-phase Study of Longitudinal Ordinal Outcome Data in Critically Ill Patients with Sepsis

The Crystalloid Liberal or Vasopressor Early Resuscitation in Sepsis (CLOVERS) trial was a randomized trial comparing the two fluid resuscitation strategies in patients with sepsis. The trial recruited 1,563 people and, over the course of 14 days, collected daily outcome information on patients' (ordered) outcome states: discharged, hospitalized, intensive care unit, or not alive. For every patient, blood samples were collected at recruitment in the trial and stored for secondary studies. We are interested in the design and analysis for a secondary study of the association of syndecan-1 concentrations (a biomarker capturing glycocalyx degradation) at baseline and ordered states data over time. Due to budgetary constraints, researchers could retrospectively measure the biomarker in 600 out of the 1563 recruited patients. We will discuss consideration for identifying whom to sample using available data, and then how to analyze the sample so that results generalize, and so that analyses get as much information out of the data as possible.  

Speaker

Jonathan Schildcrout, Vanderbilt University

An Optimal Two-step Estimation Approach for Two-phase Studies

Two-phase sampling is commonly adopted for reducing cost and improving estimation efficiency. We consider the two-phase design where the outcome and some cheap covariates are observed for a cohort at Phase I, and expensive covariates are obtained for a selected subset of the cohort at Phase II. Hence, analyzing the association between the outcome and covariates faces a missing data problem. The complete case analysis that uses only the Phase II sample is generally inefficient. In this work, we develop a two-step estimation approach, which first obtains an estimator based on the complete data and then updates it using an asymptotically mean-zero estimator obtained from a working model between the outcome and cheap covariates based on the full data. The two-step estimator is asymptotically at least as efficient as the complete-data estimator and is robust to misspecification of the working model. We propose a kernel-based method to construct a two-step estimator that achieves optimal efficiency, and also develop a simple joint update approach based on multiple working models to approximate the optimal estimator. We apply the proposed method to various outcome models for illustration. 

Co-Author

Kin Yau Wong, The Hong Kong Polytechnic University

Speaker

Qingning Zhou

Efficient Designs and Analysis of Two-Phase Studies with Longitudinal Binary Data

Researchers interested in understanding the relationship between a readily available longitudinal binary outcome and a novel biomarker exposure can be confronted with ascertainment costs that limit sample size. In such settings, two-phase studies can be cost-effective solutions that allow researchers to target informative individuals for exposure ascertainment and increase estimation precision for time-varying and/or time-fixed exposure coefficients. In this paper, we introduce a novel class of residual-dependent sampling (RDS) designs that select informative individuals using data available on the longitudinal outcome and inexpensive covariates. Together with the RDS designs, we propose a semiparametric analysis approach that efficiently uses all data to estimate the parameters. We describe a numerically stable and computationally efficient EM algorithm to maximize the semiparametric likelihood. We examine the finite sample operating characteristics of the proposed approaches through extensive simulation studies. We illustrate the usefulness of the proposed RDS designs and analysis method in the Lung Health Study. 

Speaker

Ran Tao, Vanderbilt University Medical Center

Two-phase Designs with Left- or Interval-censored Data from Electronic Health Records

Cancer screening is evolving with advanced biomedical technologies, including novel tests for precancers and early cancers. Large-scale testing of stored study specimens to evaluate new screening tests is costly. Two-phase designs provide a robust framework for this issue. In Phase I, we gather data from the old screening test and definitive outcomes (e.g., biopsies) at study visits. Disease outcomes may be prevalent but undetected during the initial screening or incident, discovered during follow-up visits, creating left- and interval-censored time-to-event data. Phase II involves selecting a subset of subjects for the new screening test. Our data analysis employs a mixture model, utilizing logistic regression to model prevalent disease risk and a proportional hazards model for incident disease risk. We propose and compare various sub-sampling schemes. Proposed frameworks are examined through simulations and are implemented in an evaluation of p16/ki-67 dual-stain as a triage screening test for HPV-positive women in cervical precancer screening, using stored specimens and electronic health record data from Kaiser Permanente Northern California (KPNC). 

Speaker

Fangya Mao