Thursday, Aug 7: 10:30 AM - 12:20 PM
4217
Contributed Papers
Music City Center
Room: CC-208A
Main Sponsor
Section on Statistical Learning and Data Science
Presentations
In many modern applications, there is a growing need to identify specific problematic entries within a dataset, referred to as cellwise outliers. These differ from the more commonly studied casewise outliers, which focus on identifying entire rows as anomalous. While numerous statistical methods exist for detecting casewise outliers (also called anomaly detection or exception mining), relatively few methods address the challenge of pinpointing problematic values within individual observations. We propose a Mahalanobis distance-based chi-squared test statistic designed to detect cellwise outliers. Using Monte Carlo simulations, we evaluate the performance of our method against existing approaches across datasets generated from various multivariate distributions. Our results demonstrate that the proposed method is computationally efficient and often outperforms competing techniques in accurately identifying cellwise outliers under a wide range of conditions.
Keywords
cellwise outliers
anomaly detection
exception mining
multivariate data
Mahalanobis' distance
We study the problem of robust posterior inference when observed data are subject to adversarial contamination, such as outliers and distributional shifts. We introduce Distributionally Robust Variational Bayes (DRVB), a robust posterior sampling method based on solving a minimax variational Bayes problem over Wasserstein ambiguity sets. Computationally, our approach leverages gradient flows on probability spaces, where the choice of geometry is crucial for addressing different forms of adversarial contamination. We design and analyze the DRVB algorithm based on Wasserstein, Fisher-Rao, and hybrid Wasserstein-Fisher-Rao flows, highlighting their respective strengths in handling outliers, distribution shift and mixed global-local contamination. Our theoretical results establish robustness guarantees and polynomial-time convergence of each discretized gradient flow to its stationary measure. Empirical results show that DRVB outperforms the naive Langevin Monte Carlo (LMC) in generating robust posterior samples across various adversarial contamination settings.
Keywords
Variational Bayes
Distributionally Robust Inference
Wasserstein-Fisher-Rao Gradient flow
mixed global-local contamination
adversarial contamination
In practice, the use of rounding is ubiquitous. Although researchers have looked at the implications of rounding continuous random variables, rounding may also be applied to functions of discrete random variables. For example, to infer the number of excess deaths due to falls after a national emergency, authorities may only provide a rounded average of deaths before and after the emergency started. Deaths from falling tend to be relatively low in most places, and such rounding may seriously affect inference on the change in the rate of deaths. In this paper, we study drawing inference on a parameter from the probability mass function of a non-negative discrete random variable Y , when for rounding coarsening width h we get U = h [Y /h] as a proxy for Y. We show that the probability generating function of
U , E(U ), and Var(U ) capture the effect of the coarsening of the support of Y. Theoretical properties are explored further under some probability distributions. Moreover, we introduce two relative risks of rounding metrics to aid the numerical assessment of how sensitive the results may be to rounding. Under certain conditions, rounding has little impact. However, we also find scenarios where rounding
can significantly affect statistical inference. The methods are applied to infer the probability of success of a binomial distribution and estimate the excess deaths due to Hurricane Maria. The simple methods we propose can partially counter rounding error effects.
Keywords
rounding error
binning
Sheppard’s correction
discrete Fourier transform
excess deaths
probability generating function
Robust estimators for regression use non-convex objective functions to shield against adverse affects of outliers. The non-convexity brings challenges, particularly in combination with penalization. Selecting hyper-parameters for the penalty is a critical task, with cross-validation (CV) the prevalent strategy in practice and good performance for convex estimators. Applied to robust estimators, however, CV often gives poor results due to the interplay between multiple local minima and the penalty. The best local minimum attained on the full sample may not be the minimum with the desired statistical properties. Furthermore, there may be a mismatch between this minimum and the minima attained in the CV folds. We introduce a novel adaptive CV strategy that tracks multiple minima for each combination of hyper-parameters and subsets of the data. A matching scheme is presented for correctly evaluating minima computed on the full sample using the corresponding minima from the CV folds. We show that the proposed strategy reduces the variability of the estimated performance metric, leads to smoother CV curves, and hence substantially increases the reliability of robust penalized estimators.
Keywords
Robust regression
Hyper-parameter tuning
Cross-validation
Non-convexity estimator
Penalized regression
When data from multiple tasks have outlier contamination, existing multitask learning methods perform less efficiently. To address this issue, we propose a robust multitask feature learning method by combining the adaptive Huber regression tasks with mixed regularization. The robustification parameters can be chosen to adapt to the sample size, model dimension, and moments of the error distribution while striking a balance between unbiasedness and robustness. We consider heavy-tailed distributions for multiple datasets that have bounded (1 + ω)th moment for any ω > 0. Our method can achieve estimation and sign recovery consistency. Additionally, we propose a robust information criterion to conduct joint inference on related tasks, which can be used for consistent model selection. Through different simulation studies and real data applications, we illustrate the performance of the proposed model can provide smaller estimation errors and higher feature selection accuracy than the non-robust multitask learning and robust single-task methods.
Keywords
Adaptive Huber Loss
Multitask Feature Learning
Robust M-estimation
Heavy-tailed Data Integration
Most anomaly detection methods assume that all training data are normal. In practice, more information may be available through limited samples of "known" anomalies. We wish to leverage this extra information to detect both known and (potentially) unknown anomalies better, while not overfitting to only known anomalies. To do so, we propose the first mathematical framework to formalize this goal a label-informed density level set estimation (LI-DLSE), which is a generalization of unsupervised anomaly detection. Our framework shows that solving a nonparametric binary classification problem can, in turn, solve the LIDLSE task. We propose a neural network trained to classify normal data versus anomalies (both known and synthetic), proving the excess risk converges to 0 fast. Known anomalies guide model
training with prior knowledge, while synthetic anomalies help with detecting unknown anomalies by labeling regions without normal data as the anomaly class. Experimental results corroborate our theory by demonstrating that synthetic anomalies mitigate overfitting to known anomalies while allowing us to incorporate additional information on known anomalies.
Keywords
Deep Learning Theory
Anomaly Detection
Cybersecurity
LLM Safety
Classification
Artificial Neural Networks (ANNs) make predictions based on patterns learned during training. However, their performance can decline if the input data distribution shifts over time, causing the model's assumptions to become invalid. To maintain reliable predictions, it is crucial to monitor for these distribution changes and update or retrain the model when necessary to adapt to new data patterns.
To address this issue, we propose utilizing one-class classification techniques to monitor the latent feature representations, or "embeddings," produced by the ANN. One-class classification methods can detect shifts in the data stream by identifying deviations from established boundaries within the feature space. If new data points begin to fall outside these boundaries, it indicates a potential change in the underlying data distribution or the parameters of the neural network, which may impact model accuracy and highlight the need for retraining. This approach is evaluated by applying LS-SVDD and SVDD to a publicly available dataset and comparing their performance.
Keywords
Embedding Layer
ANN
One-class Classification
LS-SVDD
SVDD