55: Mitigating Data Imbalance in Credit Card Fraud Detection
Tuesday, Aug 5: 10:30 AM - 12:20 PM
1276
Contributed Posters
Music City Center
Credit card fraud poses a significant challenge and leads to substantial financial losses. Although machine learning and deep learning models have been extensively studied in this domain, few address the issue of data imbalance, which can bias predictions. In this paper, we explore techniques to address data imbalance, including Synthetic Minority Oversampling Technique (SMOTE), simple oversampling, and Variational Autoencoders (VAE). These methods are evaluated using metrics tailored for imbalanced datasets. In real-world scenarios, there is often a trade-off between recall and precision, both of which significantly impact revenue.
Our preliminary results show that SMOTE biases toward recall (0.897) than precision (0.098) but generates distributionally similar synthetic data, while VAE achieves better precision (0.903) and generalizability. Combining VAE-generated data with baseline logistic regression significantly improves performance with ROC-AUC 0.978, offering a computationally efficient solution for large-scale fraud detection in imbalanced datasets. This study highlights the trade-offs between different techniques and provides a practical solution for fraud detection.
Fraud Detection
Synthetic Data
Machine Learning
Neural Network
Deep Learning
Main Sponsor
Section on Statistical Learning and Data Science
You have unsaved changes.