Sunday, Aug 3: 2:00 PM - 3:50 PM
0584
Topic-Contributed Paper Session
Music City Center
Room: CC-101C
Applied
No
Main Sponsor
Section on Statistical Graphics
Co Sponsors
Section on Statistical Computing
Presentations
The current method for forensic analysis of bullet comparison relies on manual examination by forensic examiners to
determine if bullets were discharged from the same firearm. This process is highly subjective, prompting the development
of algorithmic methods to provide objective statistical support for comparisons. However, a gap exists between the
technical understanding of these algorithms and the typical background of many forensic examiners. We present a
visualization tool designed to bridge this gap, allowing for the presentation of statistical information in a more familiar
format to forensic professionals. This forensic bullet comparison visualizer features a variety of plots that will
enable the user to examine every step of the algorithmic comparison process. The utility of the tool is demonstrated
by applying it to data from the Houston Science Lab, where it helped identify an error in the comparison process caused
by mislabeling. This tool can be used for future investigations, such as examining how distance between shots affects
scores. The framework offers a user-friendly way to convey complex statistical information to forensic examiners, facilitating
their understanding and utilization of algorithmic comparison methods.
Keywords
Data Visualization
Interactive Forensic Modeling
Cross-correlation function
Land engraved area
Forensic pattern analysis
Forensic statistics
We present FAST, an optimization framework for fast additive segmentation. FAST segments piecewise constant shape functions for each feature in a dataset to produce transparent additive models. The framework leverages a novel optimization procedure to fit these models ∼2 orders of magnitude faster than existing state-of-the-art methods, such as explainable boosting machines. We also develop new feature selection algorithms in the FAST framework to fit parsimonious models that perform well. Through experiments and case studies, we show that FAST improves the computational efficiency and interpretability of additive models
Speaker
Brian Liu, Massachusetts Institute of Technology
Sepsis is a life-threatening condition affecting millions of individuals in the US each year. The complexity of sepsis clinical management makes individualized treatment approaches desirable. The University of Pittsburgh Medical Center (UPMC) has collected electronic health records data of sepsis patients from multiple hospitals. The goal of this study is to derive individualized decision rules (IDRs) that could be safely applied to and uniformly improve decision-making across hospitals in the UPMC Health System by only using a subset of hospitals for training. Traditional approaches assume that data are sampled from a single population of interest. With multiple hospitals that vary in patient populations, treatments, and provider teams, an IDR that is successful in one hospital may not be as effective in another, and the performance achieved by a globally optimal IDR may vary greatly across hospitals, preventing it from being safely applied to unseen hospitals. To address these challenges, as well as the practical restriction of data sharing across hospitals, we introduce a new objective function and a federated learning algorithm for learning IDRs that are robust to distributional uncertainty from heterogeneous data.
The proposed framework uses a conditional maximin objective to enhance individual outcomes across hospitals, ensuring robustness against hospital-level variations. Compared to the traditional approach, the proposed method enhances the survival rate by 10 percentage points among patients who may experience extreme adverse outcomes across hospitals. Additionally, it increases the overall survival rate by 2-3 percentage points when the learned IDR is applied to unseen hospital populations.
Keywords
Conditional average treatment effect, data integration, distributionally robust learning, decentralized data.
Self-supervised contrastive learning (SSCL) is a rapidly advancing approach for learning data representations. However, a significant challenge in this paradigm is the feature suppression effect, where useful features for downstream tasks are suppressed due to dominant or easy-to-learn features overshadowing other class-relevant features, ultimately degrading the performance of SSCL models. While prior research has acknowledged the feature suppression effect, solutions with theoretical guarantees to mitigate this issue are still lacking.
In this work, we address the feature suppression problem by proposing a novel method, Fisher Contrastive Learning, which unbiasedly and exhaustively estimates the central sufficient dimension reduction function class in SSCL settings. In addition, the embedding dimensionality is not preserved in practice. FCL empirically maintains the embedding dimensionality by maximizing the discriminative power of each linear classifier learned through Fisher Contrastive Learning. We demonstrate that using our proposed method, the class-relevant features are not suppressed by strong or easy-to-learn features on datasets known for strong feature suppression effects.
Furthermore, we show that Fisher Contrastive Learning consistently outperforms existing benchmark methods on standard image benchmarks, illustrating its practical advantages.
Keywords
self-supervised learning
contrastive learning
sufficient dimension reduction
feature suppression effect
data augmentation
Interactive data visualization has become a staple of modern data presentation. However, despite the abundance of interactive data visualization packages, creating rich and sophisticated interactive figures still remains a challenging task. Notably, certain advanced interactive features such as generalized linked selection are lacking in many existing tools. This gap may stem from a subtle yet profound issue: the underlying interconnectedness of visualization components. Specifically, while many current systems based on the Grammar of Graphics strive to treat graphics, statistics, and interaction as independent, modular components (which can be freely combined), I instead argue for the inherent inadequacy of this approach due to the existence of deep links between all of these elements.
In this talk, I will explore the interconnectedness of interactive figure components and the challenges it presents for building interactive data visualization systems. I will also present plotscaper, a new R package developed to investigate and refine some of these ideas, as well as to provide a practical tool for data exploration. My goal is to convince you that, if we want to build truly general and robust interactive data visualization systems, we need to ground our thinking in some fundamental algebraic concepts, particularly ones from category theory.
Keywords
Interactive data visualization
Data visualization
R
Linked selection
Category theory