Two-phase Designs with Left- or Interval-censored Data from Electronic Health Records

Fangya Mao Speaker
 
Monday, Aug 5: 9:50 AM - 10:15 AM
Invited Paper Session 
Oregon Convention Center 
Cancer screening is evolving with advanced biomedical technologies, including novel tests for precancers and early cancers. Large-scale testing of stored study specimens to evaluate new screening tests is costly. Two-phase designs provide a robust framework for this issue. In Phase I, we gather data from the old screening test and definitive outcomes (e.g., biopsies) at study visits. Disease outcomes may be prevalent but undetected during the initial screening or incident, discovered during follow-up visits, creating left- and interval-censored time-to-event data. Phase II involves selecting a subset of subjects for the new screening test. Our data analysis employs a mixture model, utilizing logistic regression to model prevalent disease risk and a proportional hazards model for incident disease risk. We propose and compare various sub-sampling schemes. Proposed frameworks are examined through simulations and are implemented in an evaluation of p16/ki-67 dual-stain as a triage screening test for HPV-positive women in cervical precancer screening, using stored specimens and electronic health record data from Kaiser Permanente Northern California (KPNC).