The Theory and Practice of Anomaly Detection using Neural Networks: Benefit of Synthetic Data
Xiaoming Huo
Co-Author
Georgia Institute of Technology, School of Industrial & Systems Engineering
Wenke Lee
Co-Author
Georgia Institute of Technology
Thursday, Aug 7: 11:50 AM - 12:05 PM
2294
Contributed Papers
Music City Center
Most anomaly detection methods assume that all training data are normal. In practice, more information may be available through limited samples of "known" anomalies. We wish to leverage this extra information to detect both known and (potentially) unknown anomalies better, while not overfitting to only known anomalies. To do so, we propose the first mathematical framework to formalize this goal a label-informed density level set estimation (LI-DLSE), which is a generalization of unsupervised anomaly detection. Our framework shows that solving a nonparametric binary classification problem can, in turn, solve the LIDLSE task. We propose a neural network trained to classify normal data versus anomalies (both known and synthetic), proving the excess risk converges to 0 fast. Known anomalies guide model
training with prior knowledge, while synthetic anomalies help with detecting unknown anomalies by labeling regions without normal data as the anomaly class. Experimental results corroborate our theory by demonstrating that synthetic anomalies mitigate overfitting to known anomalies while allowing us to incorporate additional information on known anomalies.
Deep Learning Theory
Anomaly Detection
Cybersecurity
LLM Safety
Classification
Main Sponsor
Section on Statistical Learning and Data Science
You have unsaved changes.