Student Paper Award and John M. Chambers Statistical Software Award

Sarah Lotspeich Chair
Wake Forest University
 
Philip Waggoner Organizer
YouGov America
 
Monday, Aug 5: 2:00 PM - 3:50 PM
1520 
Topic-Contributed Paper Session 
Oregon Convention Center 
Room: CC-E141 
Student paper award and John M. Chambers statistical software award winner presentations.

Applied

No

Main Sponsor

Section on Statistical Computing

Co Sponsors

Section on Statistical Graphics

Presentations

A reproducible pipeline for extracting representative signals from wire cuts

We propose a reproducible pipeline for extracting representative signals from 2D topographic scans of the tips of cut wires. The process fully addresses many potential problems in the quality of wire cuts, including edge effects, extreme values, trends, missing values, angles, and warping. The resulting signals can be further used in source determination, which plays an important role in forensic examinations. With commonly used measurements such as the cross-correlation function, the procedure controls the false positive rate and false negative rate to the desirable values as the manual extraction pipeline but outperforms it with robustness and objectiveness. 

Co-Author(s)

Heike Hofmann, Iowa State University
Yuhang Lin, Center for Statistics and Applications in Forensic Evidence (CSAFE), Iowa State University

Speaker

Yuhang Lin, Center for Statistics and Applications in Forensic Evidence (CSAFE), Iowa State University

Cubble: An R Package for Organizing and Wrangling Multivariate Spatio-temporal Data

Multivariate spatio-temporal data have a spatial component referring to the location of each observation, a temporal component recorded at regular or irregular time intervals, and multiple variables measured at each spatial and temporal value. Often, such data are fragmented, reflecting a common practice of focusing on either spatial or temporal aspects separately. This fragmentation makes it difficult to handle them coherently and comprehensively. This work introduces a new data structure to facilitate the study of different portions or combinations of spatio-temporal data for exploratory data analysis. The proposed structure, implemented in the R package, cubble, organizes spatial and temporal variables as two facets of a single data object, allowing them to be wrangled separately or combined while ensuring synchronization. 

Co-Author

Sherry Zhang

Speaker

Sherry Zhang

Revisiting Link Prediction with the Dowker Complex

We propose a novel method to study properties of graph-structured data by means of a geometric construction called Dowker complex. We study this simplicial complex through the use of persistent homology, which has shown to be a prominent tool to uncover relevant geometric and topological information in data. A positively weighted graph induces a distance in its sets of vertices. A classical approach in persistent homology is to construct a filtered Vietoris-Rips complex with vertices on the vertices of the graph. However, when the size of the set of vertices of the graph is large, the obtained simplicial complex may be computationally hard to handle. A solution The Dowker complex is constructed on a sample in the set of vertices of the graph called landmarks. A way to guaranty sparsity and proximity of the set of landmarks to all the vertices of the graph is by considering ε-nets. We provide theoretical proofs of the stability of the Dowker construction and comparison with the Vietorips-Rips construction. We perform experiments showing that the Dowker complex based neural networks model performs good with respect to baseline methods. 

Speaker

Jae Choi, University of Texas at Dallas

Rforce: Random Forest for Composite Endpoints

Medical research often involves the study of composite endpoints that combine multiple clinical events to assess the efficacy of treatments. When constructing composite endpoints, it is a common practice to analyze the time to the first event. However, this approach overlooks outcomes that occur after the first event, resulting in information loss. Furthermore, the terminal event can not only be of interest but also a competing risk for other types of outcomes. While regression models exist to analyze all types of such outcomes, not just the first event, and properly address the terminal event, they do not account for nonlinear relationships between covariates and composite endpoints. To address these important issues, we introduce Random FORest for Composite Endpoints (Rforce) consisting of non-fatal composite events and terminal events. The proposed method handles the dependent censoring due to the terminal events with the concept of pseudo-risk time. In simulation studies, Rforce demonstrates comparable performance with existing regression-based models under linear settings and outperforms competing methods under non-linear settings. 

Speaker

Yu Wang, Medical College of Wisconsin

Ultra-efficient MCMC for Bayesian longitudinal functional data analysis

Functional mixed models are widely useful for regression analysis with dependent functional data, including longitudinal functional data with scalar predictors. However, existing algorithms for Bayesian inference with these models only provide either scalable computing or accurate approximations to the posterior distribution, but not both. We introduce a new MCMC sampling strategy for highly efficient and fully Bayesian regression with longitudinal functional data. Using a novel blocking structure paired with an orthogonalized basis reparametrization, our algorithm jointly samples the fixed effects regression functions together with all subject- and replicate-specific random effects functions. Crucially, the joint sampler optimizes sampling efficiency for these key parameters while preserving computational scalability. Perhaps surprisingly, our new MCMC sampling algorithm even surpasses state-of-the-art algorithms for frequentist estimation and variational Bayes approximations for functional mixed models—while also providing accurate posterior uncertainty quantification—and is orders of magnitude faster than existing Gibbs samplers.  

Co-Author

Dan Kowal, Cornell University

Speaker

Thomas Sun, Rice University