04/30/2025: 10:30 AM - 12:00 PM MDT
Refereed
Room: Alpine West
Chair
Julian Chan, Weber State University
Target Audience
Expert
Tracks
Computational Statistics
Symposium on Data Science and Statistics (SDSS) 2025
Presentations
Statistical models often require inputs that are not completely known. This can occur when those inputs are measured with error, indirectly, or when they correspond to an unobservable parameter in another model. A prominent application is environmental epidemiology, where individual air pollution exposure is a key variable for health outcomes, yet it cannot be inferred directly and is estimated by a model. In these cases, the common choice is the two-stage Bayesian statistical modeling approach, where the two levels of the model are written down separately. In this approach, the stage-one model estimates the unknown parameter and those estimates are then incorporated as inputs in the stage-two model. However, to target the correct posterior distributions, two-stage Bayesian models must correctly propagate the uncertainty from the first to the second stages. In practice, researchers often fail to do so and use simplified and incorrect methods. We show both analytically and empirically the negative consequences of failing to correctly account for uncertainty even in a simple setting. Plug-in methods that estimate and fix the inputs are subject to attenuation bias and underestimate uncertainties. Partial posterior methods that propagate uncertainty from the stage-one model without adjusting for the stage-two model fail to correct this bias and overinflate uncertainties. We propose two algorithms for two-stage modeling that propagate the uncertainty across the two stages. The first is a streamlined importance sampling algorithm that performs best when the inputs from the stage-one posterior are approximately independent, while the second provides a correction when this does not occur. We then use analytical and empirical results in a variety of settings to show that, unlike the common competing methods, our algorithms can correctly propagate uncertainties and target the correct distributions when the assumptions are met.
Presenting Author
Konstantin Larin, Amherst College
First Author
Konstantin Larin, Amherst College
CoAuthor
Dan Kowal, Cornell University
Lung cancer is the leading cause of cancer-related deaths in the U.S., with non-small cell lung cancer (NSCLC) comprising approximately 85% of cases. Survival analysis for NSCLC is essential for identifying clinical and genomic biomarkers influencing progression-free survival (PFS), time until progression or death due to NSCLC. Such biomarkers enable personalized treatment and prognosis prediction for NSCLC, improving patient outcomes and advancing precision oncology. In this study, we analyze a cohort of 216 U.S. patients with advanced NSCLC using two ensemble learning survival methods, random survival forests (RSF) and a gradient-boosted machine (GBM), and a stratified Cox proportional hazards models. All models accounted for censoring. RSF employs multiple decision trees to estimate hazards, with overall hazard predictions derived by averaging outputs from all trees. GBM uses regression trees as base learners, optimized with the Cox proportional hazards model's log-likelihood function. The models' PFS prediction performance was evaluated using the concordance index (C-index). All models demonstrated better-than-random prediction. GBM (C-index: 0.733) had the highest predictive capability followed by RSF (C-index: 0.732) and the stratified Cox proportional hazards model (C-index: 0.726). Key biomarkers were identified using permutation- and impurity-based feature importance and the effects of these biomarkers on PFS were characterized with hazard ratios. The models identified several significant biomarkers, including circulating albumin, derived neutrophil-to-lymphocyte ratio (dNLR), PD-L1 expression, and tumor mutational burden (TMB). Albumin and dNLR, markers of systemic inflammation, were linked to survival outcomes, reflecting the role of inflammation in cancer progression. PD-L1 and TMB, key immunotherapy biomarkers, showed modest protective effects, consistent with immunotherapy benefits for certain NSCLC patients.
Presenting Author
Owen Sun, California Academy of Mathematics and Science
First Author
Owen Sun, California Academy of Mathematics and Science
CoAuthor
Olga Korosteleva, California State University-Long Beach
Anomalous diffusion refers to processes where the mean squared displacement grows non-linearly with time, following the relation E(X^2(t))~t^β, with β representing the anomalous exponent. This type of behavior, observed in complex systems like biological cells, often deviates from traditional diffusion models. Classical approaches, such as the fractional Brownian motion (FBM) and scaled Brownian motion (SBM), assume fixed exponents, which do not account for dynamics with varying anomalous parameters. To overcome this limitation, models like FBM with random exponents (FBMRE) and SBM with random exponents (SBMRE) have been developed. This work presents a universal procedure based on statistical testing to distinguish between anomalous diffusion models with constant and random anomalous exponents. This is done using time-averaged statistics and their ratio-based counterparts. In addition, a novel approach to optimizing time-lag selection using a divergence measure, specifically the Hellinger distance, is proposed. The methodology is widely applicable to distinguish constant from random anomalous diffusion, with its effectiveness depending on the choice of statistics, time lags, and process characteristics, as demonstrated through simulations (using a two-point distribution of the anomalous exponent) and analysis of real-world data.
Presenting Author
Katarzyna Maraj-Zygmąt, Wrocław University of Science and Technology
First Author
Katarzyna Maraj-Zygmąt, Wrocław University of Science and Technology
CoAuthor(s)
Aleksandra Grzesiek, Wrocław University of Science and Technology
Diego Krapf, Colorado State University
Agnieszka Wyłomańska, Wrocław University of Science and Technology