Monday, Aug 4: 10:30 AM - 12:20 PM
0715
Topic-Contributed Paper Session
Music City Center
Room: CC-101B
Applied
Yes
Main Sponsor
Lifetime Data Science Section
Co Sponsors
Biopharmaceutical Section
International Chinese Statistical Association
Presentations
Statistics is truly a science of facilitating data informed decisions – to control and minimize the Incorrect Decision rate (such as incorrectly targeting a patient subgroup). I would like to share some intriguing observations regarding some commonly used procedures.
One interesting finding is the log-rank test does not control the Familywise Type I error rate strongly, when the log-rank test is viewed as a test combining N individual 2x2 contingency table tests (where N is the number of unique event times). The danger of making an incorrect decision upon the rejection of a log-rank test is classic: interpreting the cause of the rejection post hoc. It is the reason why usually strong control of the Familywise Type I error rate is required. Thus, in survival analysis, to control the incorrect decision rate, decision-making should incorporate a logic respecting efficacy measure such as ratio of survival times as well.
I general, an inflated Incorrect Decision rate when the Type I error rate is seemingly controlled is due to "rejecting for wrong reasons" not being counted. I will give an illustrative example that, in 2-sided testing involving multiple doses and endpoints, there is a danger of incorrectly labeling a compound as having efficacy in a secondary endpoint when there is not, if directional error rate is not counted as a Type I error. This could happen with alternative primary endpoints. Efficacy is directional, so in 2-sided testing, directional error should be counted. Confidence set methods are recommended, because they automatically control the directional error rate.
Statisticians are guardians of science - where we shine the most.
I will start by deriving simply the connect between Hazard Ratio (HR) and Living Longer Probability (LLP) using Cox's originally Mann-Whitney testing definition of HR, a connection that not only medical officer can understand but also makes all subsequent explanations easier.
Liu et al (2022, Biometrical Journal 64:198–224) showed that mixing HR in subgroups dilutes them if the biomarker has a Prognostic effect, giving the illusion that patients with high and low biomarker values benefit more from the new treatment. Key message of that paper is patients with middling biomarker values may be unfairly deprived of lifesaving new treatments.
Reversing the thought process for HR, separating patients into subgroups by prognostic factors boosts the apparent efficacy of the new treatment in each stratum. In turn, if stratified analysis for ratio efficacy ignores the prognostic effect (as in all the computer packages currently), then a sponsor can game the system by stratifying on as many prognostic factors as possible, to artificially lower HR for the overall population, a danger to public health.
The Subgroup Mixable Estimation (SME) principle takes the prognostic effect into account to properly mixes ratio efficacy in subgroups, removing such danger. In the case of HR, Liu et al (2022, Biometrical Journal 64:246–255) describes how SME mixes by first converting each conditional HR to an LLP, mixes LLPs, then converting the mixed/unconditional LLP to get the marginal HR. Using a real data set, I will show in fact SME mixing makes the conditional HR and marginal HR equal in the population/parameter space, dispelling the notion that they are apples and oranges.
Keywords
Hazard Ratio (HR)
Dilution in mixing due to Prognostic effect
Mann-Whitney testing
Living Longer Probability (LLP)
Subgroup Mixable Estimation (SME) principle
Conditional HR equal Marginal HR
As a continuation of the first presentation, we focus on time-to-event outcomes using Hazard Ratios (HR) and Time Ratios (TR) as efficacy measures in a heterogeneous patient population with differential treatment efficacy across different biomarker subgroups. We present interactive web-based Shiny applications that implement principled methods to estimate conditional and marginal efficacy measures under the proportional hazards model (for HR) and the accelerated failure time model (for TR).
To illustrate these methods, we analyze data from two cancer clinical trials. The first dataset comes from the OAK study, a randomized, open-label, phase 3 trial evaluating the efficacy of atezolizumab, a humanized anti-PD-L1 treatment, versus docetaxel in previously treated patients with non-small-cell lung cancer. The biomarker of interest is PD-L1 expression (≥1% vs. <1% on tumor cells or tumor-infiltrating immune cells). The second dataset is from Genentech's randomized phase II trial investigating onartuzumab combined with erlotinib vs erlotinib alone in patients with advanced non-small-cell lung cancer, where the biomarker is MET status determined by immunohistochemistry (IHC). Both studies use overall survival as the time-to-event outcome.
Pharmacometric "population simulations" are often used to determine whether special subpopulations, such as the renally or hepatically impaired, have elevated risk of adverse events as a result of elevated pharmacokinetic exposure to a drug. When such simulations are summarized using non-comparative statistics, e.g. P( AE if given high dose ) and P( AE if given low dose ), this simulation-based methodology conforms to the logic of "standardization" / g-formula, and therefore results in valid estimates if the outcome models are correct (1). On the other hand, when such simulations are used to obtain estimates of the causal relative risk P( AE if given high dose ) / P( AE if given low dose ), the path forward for the pharmacometric analyst is less clear. We consider two methods of summarizing population simulations in terms of relative risk and evaluate the degree to which each conforms to the principles of subgroup mixable estimation (2). Enhanced dialogue between the pharmacometric, biostatistics, and machine learning communities requires an aligned understanding of this issue as all three communities progress toward increased usage of "virtual twins".
(1) Rogers, James A., Hugo Maas, and Alejandro Pérez Pitarch. 2023. "An Introduction to Causal Inference for Pharmacometricians." CPT: Pharmacometrics & Systems Pharmacology 12 (1): 27–40.
(2) Liu, Yi, Bushi Wang, Miao Yang, Jianan Hui, Heng Xu, Siyoen Kil, and Jason C. Hsu. 2022. "Correct and Logical Causal Inference for Binary and Time-to-Event Outcomes in Randomized Controlled Trials." Biometrical Journal 64 (2): 198–224.