Wednesday, Aug 6: 10:30 AM - 12:20 PM
4155
Contributed Papers
Music City Center
Room: CC-212
Main Sponsor
Section on Statistical Computing
Presentations
Tree-based scan statistics (TBSSs) are popular methods to conduct disproportionality analyses with many hierarchically related outcomes, allowing one to search for potential increased risks of drugs and vaccines among thousands of hierarchically related outcomes. Using the Dvoretzky-Kiefer-Wolfowitz inequality, we compute statistically valid bounds on the p-values calculated by TBSS, and we use those bounds in a two-fold manner. First, we quickly estimate the number of signals in the data using the fact that for non-significant nodes, p-value lower bounds usually indicate a departure from the significance threshold early on in the TBSS run. Second, we prune non-significant nodes, thereby reducing the size of the tree and speeding up the computation. Using a real data example of clinical relevance (risk assessment of SGLT2 and GLP1 inhibitors via hierarchical testing given by the International Classification of Diseases, ICD-10), we demonstrate that pruning allows one to considerably reduce the computational effort of TBSS while discovering the same signals.
Keywords
hypothesis testing
Treescan
hierarchical outcomes
International Classification of Diseases
TBSS
risks of drugs
Instrumental variables (IVs) are widely used to adjust for measurement error (ME) bias when assessing associations of health outcomes with ME-prone independent variables. IV approaches addressing ME in longitudinal models are well established, but few methods exist for functional regression. We develop two methods to adjust for ME bias in scalar-on-function linear models. We regress a scalar outcome on an ME-prone functional variable using a functional IV for model identification and propose two least squares–based methods to adjust for ME bias. Our methods alleviate potential computational challenges encountered when applying classical regression calibration methods for bias adjustment in high-dimensional settings and adjust for potential serial correlations across time. Simulations demonstrate faster run times, lower bias, and lower AIMSE for the proposed methods when compared to existing approaches. We applied our methods to a cluster randomized trial investigating the association between body mass index and device-based energy expenditure among elementary school students in a school district in Texas.
Keywords
Digital health
Functional data
Instrumental variable
Measurement error
Physical activity
Wear-able devices
Co-Author(s)
Ufuk Beyaztas, Marmara University
Caihong Qin, Indiana University
Heyang Ji, Indiana University
Gilson Honvoh, Cincinnati Children's Hospital Medical Center
Roger Zoh, Indiana University
Mark Benden, Texas A&M University
Lan Xue, Oregon State University
Carmen Tekwe, Indiana University
First Author
Xiwei Chen, Indiana University
Presenting Author
Xiwei Chen, Indiana University
Fitting a mixture cure survival model results in two sets of estimated coefficients and standard errors. Summarizing this model geographically, such as across zip codes or counties, may benefit practitioners and policymakers. For instance, these summaries may be used to show spatial trends via visualizations. Summarizing the model output geographically involves two parts: (1) condensing a dataset spatially and (2) encapsulating a survival function via a single number, resulting in the development of risk scores. In this work, several methods are explored to accomplish these two tasks. Estimating the concordance statistic for each model allows for comparison of these methods. The risk scores were developed for the United States Renal Data System data composed of 2,228,693 patients who received their first end-stage kidney disease (ESKD) treatment between the years 2000 and 2020. The developed risk scores are validated using the clinical measurements found within the ESKD dataset.
Keywords
Finite mixture models
Survival modeling
survival mixture model
risk score
end stage kidney disease
Mixed-precision computing optimizes large-scale computing by dynamically adjusting precision levels, reducing memory usage, computational time, and energy consumption without sacrificing accuracy. The NVIDIA Blackwell GB200 Superchip demonstrates this with FP16 achieving a 27.78x speedup over FP32 and 55.55x over FP64. This approach is increasingly vital as data sizes grow and computational demands escalate. Additionally, mixed precision enhances parallel computing efficiency, enabling faster processing in high-performance computing environments.
Statistical computing benefits by using lower precision for routine tasks and higher precision for critical operations, enhancing efficiency in large-scale models. This talk covers two applications-spatial data modeling and climate model emulation-showcasing mixed-precision performance. We will also introduce two R packages, MPCR (mixed/multi-precision computing) and TLAR (tile-based linear algebra), leveraging mixed precision for greater computational efficiency in R.
Keywords
Mixed-precision computing
High-performance computing (HPC)
Parallel computing
First Author
Sameh Abdulah, King Abdullah University of Science and Technology
Presenting Author
Sameh Abdulah, King Abdullah University of Science and Technology
Parkinson's Disease (PD) is a progressive neurodegenerative disorder affecting millions worldwide. Accurate classification of PD severity using biomarker data is crucial for early diagnosis and disease monitoring. This study introduces a Generalized Variable Method (GVM)-based approach to improve statistical inference in multi-class classification problems, particularly in Parkinson's Disease classification using normally distributed biomarker data, in which patients are categorized into three or more stages of Parkinson's Disease (e.g., Mild = PD-N = Parkinson's Disease-Normal, Moderate = PD-MCI = Parkinson's Disease-Mild Cognitive Impairment, Severe = PD-D = Parkinson's Disease-Dementia) based on biomarker values. The proposed method ensures robust estimation of classification metrics, offering improved confidence interval estimation and decision-making strategies. We validate our approach through real-world biomarker datasets and Monte Carlo simulations, comparing its performance with traditional methods.
Keywords
Multi-class Classification
Youden Index
Generalized Variable Method
Parkinson's Disease
Classical method
Machine Learning Approach
Bootstrapping network data efficiently is a challenging task. The existing methods tend to make strong assumptions on both the network structure and the statistics being bootstrapped, and are computationally costly. This paper introduces a general algorithm, SSBoot, for network bootstrap that partitions the network into multiple overlapping subnetworks and then aggregates results from bootstrapping these subnetworks to generate a bootstrap sample of the network statistic of interest. This approach tends to be much faster than competing methods as most of the computations are done on smaller subnetworks. We show that SSBoot is consistent in distribution for a large class of network statistics under minimal assumptions on the network structure, and demonstrate with extensive numerical examples that the bootstrap confidence intervals produced by SSBoot attain good coverage without substantially increasing interval lengths in a fraction of the time needed for running competing methods.
Keywords
Network analysis
Bootstrapping
Overlapping partitions
Subsampling
Confidence Interval
Network subsampling