Classification Methods and Applications

Zekun Wang Chair
Johns Hopkins University
 
Thursday, Aug 7: 10:30 AM - 12:20 PM
4219 
Contributed Papers 
Music City Center 
Room: CC-208B 

Main Sponsor

Section on Statistical Learning and Data Science

Presentations

A Deep Learning-Based Alternate Iteration Algorithm for One-Class Classification

One-class classification (OCC) is a specialized machine learning approach designed for scenarios in which only data from a single class (target class) and any other points are considered outliers. Support Vector Data Description (SVDD) effectively finds a hypersphere enclosing the target class in OCC. In this research, we establish a new method that integrates a deep neural network with Least Squares Support Vector Data Description(LS-SVDD) to perform one-class classification by learning a feature space that encloses the target data within a minimal hypersphere. The parameters are optimized using an alternating iterative algorithm, ensuring both high accuracy and fast convergence. With the network weights fixed, the neural network's output serves as input to the LS-SVDD, where the center and radius are determined. The neural network parameters are then updated through backpropagation. This approach allows us to refine the model iteratively, leading to more precise parameter estimation and enhanced anomaly. To evaluate the performance, several real-world publicly available datasets were used. 

Keywords

One Class Classification

LS-SVDD

Deep Learning 

Co-Author

Shahd Alnofaie, University of Central Florida

First Author

Edgard M. Maboudou-Tchao, University of Central Florida

Presenting Author

Shahd Alnofaie, University of Central Florida

Classification of Return on Equity (ROE) using Machine learning techniques

Return on Equity (ROE) is one of the most watched financial ratios by shareholders and potential investors. Negative ROE can communicate a negative message to investors. It is important to find financial ratios that influence ROE and to find the best Machine Learning technique that can be used to predict it. We thus used four machine-learning techniques (Naive Bayes, Logistic regression, Random Forest and K Nearest Neighbour) to identify the determinants and to predict the ROE. The imbalance data was sourced from the Integrated Real-time Equity System (IRESS) which comprise of all companies in the Johannesburg Stock Exchange (JSE) that were listed in 2019. The imbalance data was balanced using original observations from previous years, using SMOTE and ROSE oversampling methods. The model evaluation metrics that were used include sensitivity, specificity, precision, F1 score and accuracy. The identified predictors were net profit margin (NPM), Interest cover (IC), earning per share (EPS), earning yield (EY) and price per earning (PPE). Random Forest dominated performance in all datasets and performed well even on imbalance dataset. 

Keywords

Returns on equity (ROE)

Machine Learning techniques

SMOTE and ROSE oversampling

Machine learning classifiers.

ROE predictors,

IRESS 

Co-Author

Silas Ntshani, University of South Africa

First Author

John Olaomi, University of South Africa

Presenting Author

John Olaomi, University of South Africa

Detection of Autism Spectrum Disorder Using Attention Based Graph Convolutional Network

Autism Spectrum Disorder (ASD) is a neurodevelopmental condition marked by atypical brain connectivity. This study presents a novel computational framework that utilizes an Attention-Based Graph Convolutional Network (GCN) to detect ASD. We use functional Magnetic Resonance Imaging data from the Autism Brain Imaging Data Exchange repository to construct functional connectivity matrices based on Pearson correlation, which captures the interactions among various brain regions given by the AAL atlas. Connectivity matrices are transformed into graph representations, where the nodes represent brain regions, and the edges encode functional connections. The Attention-Based GCN employs attention mechanisms to identify crucial connectivity patterns, enhancing both interpretability and diagnostic accuracy. The proposed framework achieves an accuracy of 90.57%, precision of 85.90%, and recall of 95.53%, outperforming existing results. This study not only advances the detection of ASD but also underscores the broader potential of Attention-Based Graph GCNs in analyzing complex relational data across various other applications. 

Keywords

ASD

Attention-Based Graph Convolutional Network

fMRI

Machine Learning

functional connectivity 

Co-Author

Abigail Kelly, Middle Tennessee State University

First Author

Ramchandra Rimal, Middle Tennessee State University

Presenting Author

Ramchandra Rimal, Middle Tennessee State University

Downscaling and Predicting Downward Shortwave Radiation

Accurately predicting Downward Shortwave Radiation (DSWR) is important for renewable energy, agriculture, and environmental studies. Global datasets provide DSWR estimates at coarse resolutions but often lack the localized precision required for tasks like energy system planning. This study introduces NN-XGBoost, a novel method that combines nearest-neighbor smoothing with the predictive power of eXtreme Gradient Boosting (XGBoost) to enhance accuracy in downscaling and predicting DSWR.

The proposed model leverages global DSWR data from Open-Meteo and local observations from Ambient Weather. Two prediction strategies are examined: (1) using a single local variable and (2) using multiple local variables. Results show that NN-XGBoost consistently outperforms both XGBoost and ARIMAX, achieving lower error (RMSE) and higher accuracy (\(R^2\)). This method provides a practical and scalable approach to improving DSWR forecasting and has significant applications in renewable energy planning, environmental monitoring, and agricultural decision-making. 

Keywords

Downward Shortwave Radiation

Nearest-Neighbor XGBoost

Forecast 

Co-Author(s)

Shadrack Asiedu, Department of Electrical and Computer Science, South Dakota State University
Abhilasha Suvedi, South Dakota State University
Hossein Moradi Rekabdarkolaee, South Dakota State University

First Author

Shree Nyaupane, South Dakota State University

Presenting Author

Shree Nyaupane, South Dakota State University

Enhancing Credit Risk Assessment Through Machine Learning: A Behavioral Scoring Approach

Artificial intelligence and machine learning advancements have transformed decision-making landscapes in the financial industry. With the advent of more complex credit products, the need for robust and innovative credit risk management is more critical.
We harness behavioral scoring insights to develop a machine learning model for credit risk management, thus providing deeper borrower profiling. Data collection was done through focus group interviews and secondary sources. Behavioral data were analyzed to identify patterns, while financial data underwent preprocessing and feature engineering to ensure compatibility with machine learning algorithms.
Machine learning models, including logistic regression, support vector machines, K-nearest neighbors, decision trees, extreme gradient boosting, light gradient boosting, and CatBoost, were trained and evaluated for accuracy, precision, recall, and F1-score. The results demonstrated the effectiveness of ensemble methods, particularly CatBoost, which outperformed other models with an accuracy of 0.87, a precision of 0.88, a recall of 0.86, and an F1-score of 0.87. 

Keywords

Behavioral scoring

Machine Learning

Credit Risk Assessment

CatBoost

Borrower profiling

Ensemble methods 

Co-Author(s)

Chika Yinka-Banjo, University of Lagos
Mary Akinyemi, Austin Peay State University

First Author

Omokhoba Blessing Yama, Univeristy of Lagos

Presenting Author

Omokhoba Blessing Yama, Univeristy of Lagos

Multi-label Random Subspace Ensemble Classification

In this work, we develop a new ensemble learning framework, multi-label Random Subspace Ensemble (mRaSE), for multi-label classification problems. Given a base classifier (e.g., multinomial logistic regression, classification tree, K-nearest neighbors), mRaSE works by first randomly sampling a collection of subspaces, then choosing the best ones that achieve the minimum cross-validation errors, and finally aggregating the chosen weak learners. In addition to its superior prediction performance, mRaSE also provides a model-free feature ranking depending on the given base classifier. An iterative version of mRaSE is also developed to further improve the performance. A model-free extension is pursued on the iterative version, leading to the so-called Super mRaSE, which accepts a collection of base classifiers as input to the algorithm. We show the proposed algorithms compared favorably with the state-of-the-art classification algorithm, including random forest and deep neural network, via extensive simulation studies and two real data applications. The new algorithms are implemented in an updated version of the R package RaSEn. 

Keywords

multi-label classification

ensemble subspace

feature ranking 

Co-Author(s)

Fan Bi, New York University
Yang Feng, New York University

First Author

Jianan Zhu, New York University

Presenting Author

Jianan Zhu, New York University

Tackling Challenges in Development and Validation of Predictive Models for Class Imbalanced Data

Small bowel obstruction (SBO) is common in ED, with operative management recommended within 24hrs (OM) for suspected ischemia or treatment failure, though 80% of cases improve with nonoperative care (NOC). We developed logistic regression (LR) and random forest (RF) models to predict NOC in patients admitted to ED. This secondary analysis uses data from a multicenter retrospective study of SBO patients diagnosed by CT at 10 EDs. 70% of data was used for training, 30% for testing, with stratification to maintain the case-control ratio, and both RF and LR models were weighted for class imbalance. We selected physical exam features, WBC count, creatinine, lactic acid, history of malignancy, and hernia from clinically/statistically significant features using stepwise regression. Of 1419 patients with history of SBO confirmed by CT imaging, 6%patients required OM. The AUROC for LR was 0.68 (95% CI: 0.56-0.79) vs. 0.56 (95% CI: 0.45-0.67) for RF, with no significant difference (P=0.176). The misclassification rate was 42% for LR vs. 43% for RF. Statistical models can help triage SBO patients in the ED, though misclassification persists; cutoff values for 95% sensitivity are provided. 

Keywords

Logistic Regression

Random Forest

Decision Trees

Predictive Models

Small Bowel Obstruction

Non-Operative Management 

Co-Author(s)

David Hoaglin
Jonathan Abelson, Lahey Hospital and Medical Center
Steven Stain, Lahey Hospital and Medical Center
Hamid Shokoohi, Harvard Medical School
Nicole Duggan, Brigham and Women’s Hospital
Charles Brower, University of Cincinnati Medical Center
Caroline Schissel, Lahey
Andrew Goldsmith, Lahey Hospital and Medical Center
David Stein, Mass General Brigham

First Author

Tasneem Zaihra Rizvi, Lahey Hospital and Medical Center

Presenting Author

Tasneem Zaihra Rizvi, Lahey Hospital and Medical Center