Computationally Intensive Methods

Nasrine Bendjilali Chair
Rowan University
 
Wednesday, Aug 6: 10:30 AM - 12:20 PM
4155 
Contributed Papers 
Music City Center 
Room: CC-212 

Main Sponsor

Section on Statistical Computing

Presentations

A computationally efficient TreeScan implementation via pruning

Tree-based scan statistics (TBSSs) are popular methods to conduct disproportionality analyses with many hierarchically related outcomes, allowing one to search for potential increased risks of drugs and vaccines among thousands of hierarchically related outcomes. Using the Dvoretzky-Kiefer-Wolfowitz inequality, we compute statistically valid bounds on the p-values calculated by TBSS, and we use those bounds in a two-fold manner. First, we quickly estimate the number of signals in the data using the fact that for non-significant nodes, p-value lower bounds usually indicate a departure from the significance threshold early on in the TBSS run. Second, we prune non-significant nodes, thereby reducing the size of the tree and speeding up the computation. Using a real data example of clinical relevance (risk assessment of SGLT2 and GLP1 inhibitors via hierarchical testing given by the International Classification of Diseases, ICD-10), we demonstrate that pruning allows one to considerably reduce the computational effort of TBSS while discovering the same signals. 

Keywords

hypothesis testing

Treescan

hierarchical outcomes

International Classification of Diseases

TBSS

risks of drugs 

Co-Author

Shirley Wang

First Author

Georg Hahn

Presenting Author

Georg Hahn

Bias adjustment in scalar-on-function regression: An instrumental variable approach

Instrumental variables (IVs) are widely used to adjust for measurement error (ME) bias when assessing associations of health outcomes with ME-prone independent variables. IV approaches addressing ME in longitudinal models are well established, but few methods exist for functional regression. We develop two methods to adjust for ME bias in scalar-on-function linear models. We regress a scalar outcome on an ME-prone functional variable using a functional IV for model identification and propose two least squares–based methods to adjust for ME bias. Our methods alleviate potential computational challenges encountered when applying classical regression calibration methods for bias adjustment in high-dimensional settings and adjust for potential serial correlations across time. Simulations demonstrate faster run times, lower bias, and lower AIMSE for the proposed methods when compared to existing approaches. We applied our methods to a cluster randomized trial investigating the association between body mass index and device-based energy expenditure among elementary school students in a school district in Texas. 

Keywords

Digital health

Functional data

Instrumental variable

Measurement error

Physical activity

Wear-able devices 

Co-Author(s)

Ufuk Beyaztas, Marmara University
Caihong Qin, Indiana University
Heyang Ji, Indiana University
Gilson Honvoh, Cincinnati Children's Hospital Medical Center
Roger Zoh, Indiana University
Mark Benden, Texas A&M University
Lan Xue, Oregon State University
Carmen Tekwe, Indiana University

First Author

Xiwei Chen, Indiana University

Presenting Author

Xiwei Chen, Indiana University

Development and Validation of Mortality Risk Scores for Persons with End-Stage Kidney Disease

Fitting a mixture cure survival model results in two sets of estimated coefficients and standard errors. Summarizing this model geographically, such as across zip codes or counties, may benefit practitioners and policymakers. For instance, these summaries may be used to show spatial trends via visualizations. Summarizing the model output geographically involves two parts: (1) condensing a dataset spatially and (2) encapsulating a survival function via a single number, resulting in the development of risk scores. In this work, several methods are explored to accomplish these two tasks. Estimating the concordance statistic for each model allows for comparison of these methods. The risk scores were developed for the United States Renal Data System data composed of 2,228,693 patients who received their first end-stage kidney disease (ESKD) treatment between the years 2000 and 2020. The developed risk scores are validated using the clinical measurements found within the ESKD dataset. 

Keywords

Finite mixture models

Survival modeling

survival mixture model

risk score



end stage kidney disease 

Co-Author(s)

Nathan Meyer, South Dakota State University
Hossein Moradi Rekabdarkolaee, South Dakota State University

First Author

Semhar Michael, South Dakota State University

Presenting Author

Semhar Michael, South Dakota State University

Efficient Statistical Computing with Mixed-Precision Power and GPUs Acceleration

Mixed-precision computing optimizes large-scale computing by dynamically adjusting precision levels, reducing memory usage, computational time, and energy consumption without sacrificing accuracy. The NVIDIA Blackwell GB200 Superchip demonstrates this with FP16 achieving a 27.78x speedup over FP32 and 55.55x over FP64. This approach is increasingly vital as data sizes grow and computational demands escalate. Additionally, mixed precision enhances parallel computing efficiency, enabling faster processing in high-performance computing environments.

Statistical computing benefits by using lower precision for routine tasks and higher precision for critical operations, enhancing efficiency in large-scale models. This talk covers two applications-spatial data modeling and climate model emulation-showcasing mixed-precision performance. We will also introduce two R packages, MPCR (mixed/multi-precision computing) and TLAR (tile-based linear algebra), leveraging mixed precision for greater computational efficiency in R. 

Keywords

Mixed-precision computing

High-performance computing (HPC)

Parallel computing 

First Author

Sameh Abdulah, King Abdullah University of Science and Technology

Presenting Author

Sameh Abdulah, King Abdullah University of Science and Technology

WITHDRAWN Generalized Inference of Youden Index for Multi-Class Classification Applied to Parkinson's Disease

Parkinson's Disease (PD) is a progressive neurodegenerative disorder affecting millions worldwide. Accurate classification of PD severity using biomarker data is crucial for early diagnosis and disease monitoring. This study introduces a Generalized Variable Method (GVM)-based approach to improve statistical inference in multi-class classification problems, particularly in Parkinson's Disease classification using normally distributed biomarker data, in which patients are categorized into three or more stages of Parkinson's Disease (e.g., Mild = PD-N = Parkinson's Disease-Normal, Moderate = PD-MCI = Parkinson's Disease-Mild Cognitive Impairment, Severe = PD-D = Parkinson's Disease-Dementia) based on biomarker values. The proposed method ensures robust estimation of classification metrics, offering improved confidence interval estimation and decision-making strategies. We validate our approach through real-world biomarker datasets and Monte Carlo simulations, comparing its performance with traditional methods. 

Keywords

Multi-class Classification

Youden Index

Generalized Variable Method

Parkinson's Disease

Classical method

Machine Learning Approach 

Co-Author

Nayyer Qasim

First Author

Sumith Gunasekera

Network Bootstrap Using Overlapping Partitions

Bootstrapping network data efficiently is a challenging task. The existing methods tend to make strong assumptions on both the network structure and the statistics being bootstrapped, and are computationally costly. This paper introduces a general algorithm, SSBoot, for network bootstrap that partitions the network into multiple overlapping subnetworks and then aggregates results from bootstrapping these subnetworks to generate a bootstrap sample of the network statistic of interest. This approach tends to be much faster than competing methods as most of the computations are done on smaller subnetworks. We show that SSBoot is consistent in distribution for a large class of network statistics under minimal assumptions on the network structure, and demonstrate with extensive numerical examples that the bootstrap confidence intervals produced by SSBoot attain good coverage without substantially increasing interval lengths in a fraction of the time needed for running competing methods. 

Keywords

Network analysis

Bootstrapping

Overlapping partitions

Subsampling

Confidence Interval

Network subsampling 

Co-Author

Elizaveta Levina, University of Michigan

First Author

Sayan Chakrabarty, University of Michigan

Presenting Author

Sayan Chakrabarty, University of Michigan