P05 A two-stage approach to assess treatment effects in clinical trial populations enriched with risk predictions

Conference: ASA Biopharmaceutical Section Regulatory-Industry Statistics Workshop 2024
09/27/2024: 9:45 AM - 10:30 AM EDT
Posters 
Room: White Oak 

Description

Background

A randomized controlled trial is a common clinical trial design used to evaluate a treatment effect by comparing outcomes between one group of subjects under the treatment of interest (treatment arm) and the other under placebo or routine treatments (control arm). Typically, the trial populations are selected from subjects with the target disease identified based on certain criteria. However, in practice, the selection process can be challenging due to several factors. For example, the stages or severity levels of a disease may be insufficient or inadequate to identify the patients who would benefit from the treatment in a study. In addition, to achieve the desired sample size, the selection criteria are often defined to possibly include more subjects, and that may cause the risk of disease being lower than expected in the trial population. Consequently, the treatment may show lower effectiveness in the trial population due to the treatment effect is reduced as less contrast is observed in the comparison between two arms. In this study, we propose a two-stage approach in which we first build a risk prediction model using control arm data and then enrich the risk in the trial population by weighting with the risk predictions in both arms for assessing the treatment effect.

Methods

Simulated randomized clinical trial (RCT) data with two (treatment and control) arms are generated based on real-world observational time-to-event data with treatment and non-treatment groups. For simulating the randomization in a RCT, inverse probability weighting (IPW) estimating the average treatment effect in the treated (ATT) is calculated in observational data and applied to balance the distributions of baseline risk factors such that standardized mean differences (SMD) are less than 0.1 between the two arms. Total sample size is simulated at 1,000 (815 in treatment arm and 185 in control arm) based on the ratio observed in the real-world data. Follow-up stops at the earliest adverse event (AE) or censor until end of study at 90 days. The simulated RCT study is aimed to evaluate whether the treatment can provide better protective effect to the incidence of AE compared to the control arm.
In the first stage, a risk prediction model is built by fitting Cox models with AE incidence as the outcome and baseline risk factors as the predictors in the control arm data. Per trial design, the control arm represents the basic risk in the study, thus the individual predicted value calculated from the risk prediction model can be used to evaluate whether a patient is under relatively higher or lower basic risk conditional on the baseline risk factors. In the simulated RCT, a total of nine baseline risk factors are measured, including age, BMI, heart rate, systolic arterial blood pressure (SBP), elevated troponin, saturation of peripheral oxygen (SpO2), disease severity, disease history, and cancer history. To achieve a parsimonious risk prediction model with fair classification effectiveness, we adopted a forward selection with a given threshold defined as concordance improvement of at least 1%.

In the second stage, predicted values of basic risk for subjects of both arms are calculated based on the risk prediction model built in the first stage. In this study, predicted values are calculated in one of the three forms: 1) the linear predictor (LP), where the values are shifted to be all positive by adding an offset constant, 2) the risk score (RS), which is the exponential value of the LP, and 3) the inverse Mills ratio of the minus LP (IMR). For all forms, higher values indicate higher basic risk to AE incidence. Sampling-weighted Cox models are performed incorporating the three forms of risk predictions as weights to assess the treatment effects while adjusting for potential confounding effects from all the baseline risk factors as covariates. When subjects predicted with higher risk are weighted more than those with lower risk in the model, the treatment effect is assessed in a pseudo trial population with enriched basic risk.

Results

Among the 1,000 subjects in the simulated RCT, for continuous data, the baseline mean (standard deviation, SD) age is at 61.3 (14.8) years, BMI at 34.6 (8.8), heart rate at 105.9 (19.3) bpm, SBP at 137.4 (23.2) mmHg, and SpO2 at 93.6 (5.5) %; for dichotomous data, the frequency (percentage) of elevated (versus non-elevated) troponin is at 790 (79%), high (versus non-high) disease severity at 76 (7.6%), positive (versus negative) disease history at 109 (10.9%), and positive (versus negative) cancer history at 208 (20.8%). SMDs of those distributions between two arms are all less than 0.1. The overall 90-day cumulative AE incidence based on Kaplan-Meier (KM) estimates is 10.6% (treatment versus control: 9.6% versus 15.1%).

After the model selection process, the risk prediction model is determined with four predictors, including heart rate, disease history, SBP, and elevated troponin. The model concordance is 72.7% suggesting a fair-good classification performance. Predicted values are calculated for subjects in both arms based on the risk prediction model. After constraining the sum to the sample size, all three forms of risk predictions have means at 1, and the SDs are 0.27, 0.76, and 0.49 in LP, RS, and IMR, respectively. In comparisons of the distributions between the three forms, LP is the most conservative with a range from 0.0004 to 1.75 and appears symmetric around 1. RS is the most aggressive with a linear increasing trend from 0.04 to 1 and then exponential up to 5.76. IMR appears to be the compromise between LP and RS with a range from 0.01 to 2.65. After weighting with the risk predictions, weighted KM estimators show the cumulative AE incidences are increased and the differences in KM curves between the two arms also appear increased compared to the unweighted data. The 90-day cumulative AE incidences for using LP, RS, IMR are 11.2% (treatment versus control:10.2% versus 15.7%), 13% (11.8% versus 19%), and 12% (10.8% versus 17.9%), respectively.
Prior to weighting with risk predictions, the hazard ratio (HR) of treatment is 0.63 (p = 0.072) in a Cox model after adjusting for confounding effects of all baseline risk factors. After weighting using sampling-weighted Cox models, the HRs of treatment are 0.63 (p = 0.07), 0.59 (p = 0.079), and 0.58 (p = 0.044) using weights of LP, RS, and IMR, respectively. Based on the results, significant treatment effect is detected when weighting using IMR, and the interpretation is, the subjects under treatment would be significantly protected from AE incidence and have 42% lower risk compared to those without the treatment.

Discussion

Our results based on simulated RCT show the two-stage approach can help enrich the basic risk in trial populations and thus enhance the treatment effect estimate as the comparison between two arms. However, some important assumptions or limitations may be required to assure the two-stage approach can provide robust results. First, the risk observed in the treatment arm is the same as the risk in the treatment arm when the treatment has no effects (or effects equivalent to placebo or routine treatments). If there existed unmeasured factors which drove certain subjects to be included in one arm over the other, or if the treatment itself introduced risk independent to the basic risk, then the risk predictions would fail to properly enrich the risk for both arms. Second, sufficient sample size may be needed in the control arm to build a reliable risk prediction model. Third, a parsimonious risk prediction model is preferred to avoid overfitting issue. If the risk prediction model too closely fits to the control arm data, it may not be well generalized to the treatment arm. In this study, we simulated RCT based on real-world data as an application example to present and support the approach. In the example, IMR appears to be the best choice of weights, and that could be due to the compromise property between the other two. Note the value of this approach is not to enhance power in general but enrich the basic risk in the trial population based on the risk predictions learned from the control arm data. The treatment effect and significance will not be enhanced if the treatment dose not reduce the risk associated with the important risk factors. A more comprehensive simulation study may be needed to well examine the performance of this approach under various scenarios.

This two-stage approach is simple to apply and requires no additional data beyond the trial population already collected in a study. This approach can be particularly useful when the treatment effect looks promising but fails to show significance, possibly due to the underlying basic risk in the trial population is lower than expected. After enriching the basic risk, the treatment effect may be enhanced and reach statistical significance.

Presenting Author

Peter Wilson, Inari Medical

CoAuthor(s)

Yu-Hsiang Shu, Inari Medical
Peter Wilson, Inari Medical
Yu-Chen Su

Topic Description

RWE
ASA Biopharmaceutical Section Regulatory-Industry Statistics Workshop 2024