Print Close

55: Mitigating Data Imbalance in Credit Card Fraud Detection

Presented During: Contributed Poster Presentations: Section on Statistical Learning and Data Science

Yisong Chen Co-Author

Chuanhao Nie Co-Author

Yixin Xu Co-Author

Chuqing Zhao First Author
Harvard University

Chuqing Zhao Presenting Author
Harvard University

Tuesday, Aug 5: 10:30 AM - 12:20 PM
1276
Contributed Posters

Music City Center

Credit card fraud poses a significant challenge and leads to substantial financial losses. Although machine learning and deep learning models have been extensively studied in this domain, few address the issue of data imbalance, which can bias predictions. In this paper, we explore techniques to address data imbalance, including Synthetic Minority Oversampling Technique (SMOTE), simple oversampling, and Variational Autoencoders (VAE). These methods are evaluated using metrics tailored for imbalanced datasets. In real-world scenarios, there is often a trade-off between recall and precision, both of which significantly impact revenue.
Our preliminary results show that SMOTE biases toward recall (0.897) than precision (0.098) but generates distributionally similar synthetic data, while VAE achieves better precision (0.903) and generalizability. Combining VAE-generated data with baseline logistic regression significantly improves performance with ROC-AUC 0.978, offering a computationally efficient solution for large-scale fraud detection in imbalanced datasets. This study highlights the trade-offs between different techniques and provides a practical solution for fraud detection.

Keywords

Fraud Detection

Synthetic Data

Machine Learning

Neural Network

Deep Learning

Main Sponsor

Section on Statistical Learning and Data Science