A Principled Approach Using the Subgroup Mixable Estimation for Logic-respecting Causal Estimand, Including Making Stratified Conditional Efficacy Exactly Equal to Marginal Efficacy for Time-to-event Outcome

Xinping Cui Chair
University of California-Riverside
 
Xueliang Pan Discussant
The Ohio State University
 
Jason Hsu Organizer
The Ohio State University
 
Ying Ding Organizer
University of Pittsburgh
 
Monday, Aug 4: 10:30 AM - 12:20 PM
0715 
Topic-Contributed Paper Session 
Music City Center 
Room: CC-101B 

Applied

Yes

Main Sponsor

Lifetime Data Science Section

Co Sponsors

Biopharmaceutical Section
International Chinese Statistical Association

Presentations

Statistical Decision-Making should control the Incorrect Decision Rate

Statistics is truly a science of facilitating data informed decisions – to control and minimize the Incorrect Decision rate (such as incorrectly targeting a patient subgroup). I would like to share some intriguing observations regarding some commonly used procedures.
One interesting finding is the log-rank test does not control the Familywise Type I error rate strongly, when the log-rank test is viewed as a test combining N individual 2x2 contingency table tests (where N is the number of unique event times). The danger of making an incorrect decision upon the rejection of a log-rank test is classic: interpreting the cause of the rejection post hoc. It is the reason why usually strong control of the Familywise Type I error rate is required. Thus, in survival analysis, to control the incorrect decision rate, decision-making should incorporate a logic respecting efficacy measure such as ratio of survival times as well.
I general, an inflated Incorrect Decision rate when the Type I error rate is seemingly controlled is due to "rejecting for wrong reasons" not being counted. I will give an illustrative example that, in 2-sided testing involving multiple doses and endpoints, there is a danger of incorrectly labeling a compound as having efficacy in a secondary endpoint when there is not, if directional error rate is not counted as a Type I error. This could happen with alternative primary endpoints. Efficacy is directional, so in 2-sided testing, directional error should be counted. Confidence set methods are recommended, because they automatically control the directional error rate.
Statisticians are guardians of science - where we shine the most.
 

Speaker

Hong Tian, BeiGene

The Subgroup Mixable Estimation principle makes stratified conditional HR match the marginal HR, and cures an oversight in stratified analysis in computer packages

I will start by deriving simply the connect between Hazard Ratio (HR) and Living Longer Probability (LLP) using Cox's originally Mann-Whitney testing definition of HR, a connection that not only medical officer can understand but also makes all subsequent explanations easier.

Liu et al (2022, Biometrical Journal 64:198–224) showed that mixing HR in subgroups dilutes them if the biomarker has a Prognostic effect, giving the illusion that patients with high and low biomarker values benefit more from the new treatment. Key message of that paper is patients with middling biomarker values may be unfairly deprived of lifesaving new treatments.

Reversing the thought process for HR, separating patients into subgroups by prognostic factors boosts the apparent efficacy of the new treatment in each stratum. In turn, if stratified analysis for ratio efficacy ignores the prognostic effect (as in all the computer packages currently), then a sponsor can game the system by stratifying on as many prognostic factors as possible, to artificially lower HR for the overall population, a danger to public health.

The Subgroup Mixable Estimation (SME) principle takes the prognostic effect into account to properly mixes ratio efficacy in subgroups, removing such danger. In the case of HR, Liu et al (2022, Biometrical Journal 64:246–255) describes how SME mixes by first converting each conditional HR to an LLP, mixes LLPs, then converting the mixed/unconditional LLP to get the marginal HR. Using a real data set, I will show in fact SME mixing makes the conditional HR and marginal HR equal in the population/parameter space, dispelling the notion that they are apples and oranges.  

Keywords

Hazard Ratio (HR)

Dilution in mixing due to Prognostic effect

Mann-Whitney testing

Living Longer Probability (LLP)

Subgroup Mixable Estimation (SME) principle

Conditional HR equal Marginal HR 

Speaker

Jason Hsu, The Ohio State University

Principled Methods for Estimating Conditional and Marginal Efficacy Measures in Time-to-Event Outcomes Using Web-Interactive Shiny Apps

As a continuation of the first presentation, we focus on time-to-event outcomes using Hazard Ratios (HR) and Time Ratios (TR) as efficacy measures in a heterogeneous patient population with differential treatment efficacy across different biomarker subgroups. We present interactive web-based Shiny applications that implement principled methods to estimate conditional and marginal efficacy measures under the proportional hazards model (for HR) and the accelerated failure time model (for TR).

To illustrate these methods, we analyze data from two cancer clinical trials. The first dataset comes from the OAK study, a randomized, open-label, phase 3 trial evaluating the efficacy of atezolizumab, a humanized anti-PD-L1 treatment, versus docetaxel in previously treated patients with non-small-cell lung cancer. The biomarker of interest is PD-L1 expression (≥1% vs. <1% on tumor cells or tumor-infiltrating immune cells). The second dataset is from Genentech's randomized phase II trial investigating onartuzumab combined with erlotinib vs erlotinib alone in patients with advanced non-small-cell lung cancer, where the biomarker is MET status determined by immunohistochemistry (IHC). Both studies use overall survival as the time-to-event outcome.
 

Co-Author(s)

Jiaqian Liu, University of Pittsburgh
Ying Ding, University of Pittsburgh

Speaker

Jiaqian Liu, University of Pittsburgh

Simulation-based Estimation of Relative Risk in Pharmacometric Analyses: How Much Do We Know About My Virtual Twin?

Pharmacometric "population simulations" are often used to determine whether special subpopulations, such as the renally or hepatically impaired, have elevated risk of adverse events as a result of elevated pharmacokinetic exposure to a drug. When such simulations are summarized using non-comparative statistics, e.g. P( AE if given high dose ) and P( AE if given low dose ), this simulation-based methodology conforms to the logic of "standardization" / g-formula, and therefore results in valid estimates if the outcome models are correct (1). On the other hand, when such simulations are used to obtain estimates of the causal relative risk P( AE if given high dose ) / P( AE if given low dose ), the path forward for the pharmacometric analyst is less clear. We consider two methods of summarizing population simulations in terms of relative risk and evaluate the degree to which each conforms to the principles of subgroup mixable estimation (2). Enhanced dialogue between the pharmacometric, biostatistics, and machine learning communities requires an aligned understanding of this issue as all three communities progress toward increased usage of "virtual twins".


(1) Rogers, James A., Hugo Maas, and Alejandro Pérez Pitarch. 2023. "An Introduction to Causal Inference for Pharmacometricians." CPT: Pharmacometrics & Systems Pharmacology 12 (1): 27–40.

(2) Liu, Yi, Bushi Wang, Miao Yang, Jianan Hui, Heng Xu, Siyoen Kil, and Jason C. Hsu. 2022. "Correct and Logical Causal Inference for Binary and Time-to-Event Outcomes in Randomized Controlled Trials." Biometrical Journal 64 (2): 198–224. 

Speaker

James Rogers, Metrum Research Group