Anomaly Detection and Robust Modeling

Jing Ouyang Chair
University of Michigan
 
Thursday, Aug 7: 10:30 AM - 12:20 PM
4217 
Contributed Papers 
Music City Center 
Room: CC-208A 

Main Sponsor

Section on Statistical Learning and Data Science

Presentations

Zooming in on Anomalies: An Approach to Detecting Cellwise Outliers

In many modern applications, there is a growing need to identify specific problematic entries within a dataset, referred to as cellwise outliers. These differ from the more commonly studied casewise outliers, which focus on identifying entire rows as anomalous. While numerous statistical methods exist for detecting casewise outliers (also called anomaly detection or exception mining), relatively few methods address the challenge of pinpointing problematic values within individual observations. We propose a Mahalanobis distance-based chi-squared test statistic designed to detect cellwise outliers. Using Monte Carlo simulations, we evaluate the performance of our method against existing approaches across datasets generated from various multivariate distributions. Our results demonstrate that the proposed method is computationally efficient and often outperforms competing techniques in accurately identifying cellwise outliers under a wide range of conditions. 

Keywords

cellwise outliers

anomaly detection

exception mining

multivariate data

Mahalanobis' distance 

Co-Author

William Christensen, Brigham Young University

First Author

Jackson Passey

Presenting Author

Jackson Passey

Distributionally Robust Posterior Sampling - A Variational Bayes Approach

We study the problem of robust posterior inference when observed data are subject to adversarial contamination, such as outliers and distributional shifts. We introduce Distributionally Robust Variational Bayes (DRVB), a robust posterior sampling method based on solving a minimax variational Bayes problem over Wasserstein ambiguity sets. Computationally, our approach leverages gradient flows on probability spaces, where the choice of geometry is crucial for addressing different forms of adversarial contamination. We design and analyze the DRVB algorithm based on Wasserstein, Fisher-Rao, and hybrid Wasserstein-Fisher-Rao flows, highlighting their respective strengths in handling outliers, distribution shift and mixed global-local contamination. Our theoretical results establish robustness guarantees and polynomial-time convergence of each discretized gradient flow to its stationary measure. Empirical results show that DRVB outperforms the naive Langevin Monte Carlo (LMC) in generating robust posterior samples across various adversarial contamination settings. 

Keywords

Variational Bayes

Distributionally Robust Inference

Wasserstein-Fisher-Rao Gradient flow

mixed global-local contamination

adversarial contamination 

Co-Author(s)

Bennett Zhu
David Blei, Columbia University-Data Science Institute

First Author

Bohan Wu, Columbia University

Presenting Author

Bennett Zhu

Estimating Counts Through an Average Rounded to an Integer and its Theoretical & Practical Effects

In practice, the use of rounding is ubiquitous. Although researchers have looked at the implications of rounding continuous random variables, rounding may also be applied to functions of discrete random variables. For example, to infer the number of excess deaths due to falls after a national emergency, authorities may only provide a rounded average of deaths before and after the emergency started. Deaths from falling tend to be relatively low in most places, and such rounding may seriously affect inference on the change in the rate of deaths. In this paper, we study drawing inference on a parameter from the probability mass function of a non-negative discrete random variable Y , when for rounding coarsening width h we get U = h [Y /h] as a proxy for Y. We show that the probability generating function of
U , E(U ), and Var(U ) capture the effect of the coarsening of the support of Y. Theoretical properties are explored further under some probability distributions. Moreover, we introduce two relative risks of rounding metrics to aid the numerical assessment of how sensitive the results may be to rounding. Under certain conditions, rounding has little impact. However, we also find scenarios where rounding
can significantly affect statistical inference. The methods are applied to infer the probability of success of a binomial distribution and estimate the excess deaths due to Hurricane Maria. The simple methods we propose can partially counter rounding error effects. 

Keywords

rounding error

binning

Sheppard’s correction

discrete Fourier transform

excess deaths

probability generating function 

Co-Author(s)

Axel Cortes-Cubero, Protocol Labs
Israel Almodovar-Rivera, University of Puerto Rico At Mayaguez
Wolfgang Rolke, University of Puerto Rico RUM

First Author

Roberto Rivera, University of Puerto Rico-Mayaguez

Presenting Author

Roberto Rivera, University of Puerto Rico-Mayaguez

Hyper-Parameter Selection for Robust & Non-Convex Estimators via Information Sharing

Robust estimators for regression use non-convex objective functions to shield against adverse affects of outliers. The non-convexity brings challenges, particularly in combination with penalization. Selecting hyper-parameters for the penalty is a critical task, with cross-validation (CV) the prevalent strategy in practice and good performance for convex estimators. Applied to robust estimators, however, CV often gives poor results due to the interplay between multiple local minima and the penalty. The best local minimum attained on the full sample may not be the minimum with the desired statistical properties. Furthermore, there may be a mismatch between this minimum and the minima attained in the CV folds. We introduce a novel adaptive CV strategy that tracks multiple minima for each combination of hyper-parameters and subsets of the data. A matching scheme is presented for correctly evaluating minima computed on the full sample using the corresponding minima from the CV folds. We show that the proposed strategy reduces the variability of the estimated performance metric, leads to smoother CV curves, and hence substantially increases the reliability of robust penalized estimators. 

Keywords

Robust regression

Hyper-parameter tuning

Cross-validation

Non-convexity estimator

Penalized regression 

Co-Author

Siqi Wei, George Mason University

First Author

David Kepplinger, George Mason University

Presenting Author

David Kepplinger, George Mason University

Robust Multitask Feature Learning with Adaptive Huber Regressions

When data from multiple tasks have outlier contamination, existing multitask learning methods perform less efficiently. To address this issue, we propose a robust multitask feature learning method by combining the adaptive Huber regression tasks with mixed regularization. The robustification parameters can be chosen to adapt to the sample size, model dimension, and moments of the error distribution while striking a balance between unbiasedness and robustness. We consider heavy-tailed distributions for multiple datasets that have bounded (1 + ω)th moment for any ω > 0. Our method can achieve estimation and sign recovery consistency. Additionally, we propose a robust information criterion to conduct joint inference on related tasks, which can be used for consistent model selection. Through different simulation studies and real data applications, we illustrate the performance of the proposed model can provide smaller estimation errors and higher feature selection accuracy than the non-robust multitask learning and robust single-task methods. 

Keywords

Adaptive Huber Loss

Multitask Feature Learning

Robust M-estimation

Heavy-tailed Data Integration 

Co-Author(s)

Wei Xu, University of Toronto
Xin Gao, York University

First Author

Yuan Zhong

Presenting Author

Yuan Zhong

The Theory and Practice of Anomaly Detection using Neural Networks: Benefit of Synthetic Data

Most anomaly detection methods assume that all training data are normal. In practice, more information may be available through limited samples of "known" anomalies. We wish to leverage this extra information to detect both known and (potentially) unknown anomalies better, while not overfitting to only known anomalies. To do so, we propose the first mathematical framework to formalize this goal a label-informed density level set estimation (LI-DLSE), which is a generalization of unsupervised anomaly detection. Our framework shows that solving a nonparametric binary classification problem can, in turn, solve the LIDLSE task. We propose a neural network trained to classify normal data versus anomalies (both known and synthetic), proving the excess risk converges to 0 fast. Known anomalies guide model
training with prior knowledge, while synthetic anomalies help with detecting unknown anomalies by labeling regions without normal data as the anomaly class. Experimental results corroborate our theory by demonstrating that synthetic anomalies mitigate overfitting to known anomalies while allowing us to incorporate additional information on known anomalies. 

Keywords

Deep Learning Theory

Anomaly Detection

Cybersecurity

LLM Safety

Classification 

Co-Author(s)

Matthew Lau, Georgia Institute of Technology
Xiaoming Huo, Georgia Institute of Technology, School of Industrial & Systems Engineering
Jizhou Chen, Georgia Institute of Technology
Xiangchi Yuan, Georgia Institute of Technology
Wenke Lee, Georgia Institute of Technology

First Author

Tian-Yi Zhou

Presenting Author

Tian-Yi Zhou

Anomaly Detection in Neural Networks via One-Class Support Vector Methods

Artificial Neural Networks (ANNs) make predictions based on patterns learned during training. However, their performance can decline if the input data distribution shifts over time, causing the model's assumptions to become invalid. To maintain reliable predictions, it is crucial to monitor for these distribution changes and update or retrain the model when necessary to adapt to new data patterns.

To address this issue, we propose utilizing one-class classification techniques to monitor the latent feature representations, or "embeddings," produced by the ANN. One-class classification methods can detect shifts in the data stream by identifying deviations from established boundaries within the feature space. If new data points begin to fall outside these boundaries, it indicates a potential change in the underlying data distribution or the parameters of the neural network, which may impact model accuracy and highlight the need for retraining. This approach is evaluated by applying LS-SVDD and SVDD to a publicly available dataset and comparing their performance. 

Keywords

Embedding Layer

ANN

One-class Classification

LS-SVDD

SVDD 

Co-Author

Poorna Sandamini Senaratne, University of Central Florida

First Author

Edgard M. Maboudou-Tchao, University of Central Florida

Presenting Author

Poorna Sandamini Senaratne, University of Central Florida