Thursday, Aug 7: 10:30 AM - 12:20 PM
4219
Contributed Papers
Music City Center
Room: CC-208B
Main Sponsor
Section on Statistical Learning and Data Science
Presentations
One-class classification (OCC) is a specialized machine learning approach designed for scenarios in which only data from a single class (target class) and any other points are considered outliers. Support Vector Data Description (SVDD) effectively finds a hypersphere enclosing the target class in OCC. In this research, we establish a new method that integrates a deep neural network with Least Squares Support Vector Data Description(LS-SVDD) to perform one-class classification by learning a feature space that encloses the target data within a minimal hypersphere. The parameters are optimized using an alternating iterative algorithm, ensuring both high accuracy and fast convergence. With the network weights fixed, the neural network's output serves as input to the LS-SVDD, where the center and radius are determined. The neural network parameters are then updated through backpropagation. This approach allows us to refine the model iteratively, leading to more precise parameter estimation and enhanced anomaly. To evaluate the performance, several real-world publicly available datasets were used.
Keywords
One Class Classification
LS-SVDD
Deep Learning
Return on Equity (ROE) is one of the most watched financial ratios by shareholders and potential investors. Negative ROE can communicate a negative message to investors. It is important to find financial ratios that influence ROE and to find the best Machine Learning technique that can be used to predict it. We thus used four machine-learning techniques (Naive Bayes, Logistic regression, Random Forest and K Nearest Neighbour) to identify the determinants and to predict the ROE. The imbalance data was sourced from the Integrated Real-time Equity System (IRESS) which comprise of all companies in the Johannesburg Stock Exchange (JSE) that were listed in 2019. The imbalance data was balanced using original observations from previous years, using SMOTE and ROSE oversampling methods. The model evaluation metrics that were used include sensitivity, specificity, precision, F1 score and accuracy. The identified predictors were net profit margin (NPM), Interest cover (IC), earning per share (EPS), earning yield (EY) and price per earning (PPE). Random Forest dominated performance in all datasets and performed well even on imbalance dataset.
Keywords
Returns on equity (ROE)
Machine Learning techniques
SMOTE and ROSE oversampling
Machine learning classifiers.
ROE predictors,
IRESS
Autism Spectrum Disorder (ASD) is a neurodevelopmental condition marked by atypical brain connectivity. This study presents a novel computational framework that utilizes an Attention-Based Graph Convolutional Network (GCN) to detect ASD. We use functional Magnetic Resonance Imaging data from the Autism Brain Imaging Data Exchange repository to construct functional connectivity matrices based on Pearson correlation, which captures the interactions among various brain regions given by the AAL atlas. Connectivity matrices are transformed into graph representations, where the nodes represent brain regions, and the edges encode functional connections. The Attention-Based GCN employs attention mechanisms to identify crucial connectivity patterns, enhancing both interpretability and diagnostic accuracy. The proposed framework achieves an accuracy of 90.57%, precision of 85.90%, and recall of 95.53%, outperforming existing results. This study not only advances the detection of ASD but also underscores the broader potential of Attention-Based Graph GCNs in analyzing complex relational data across various other applications.
Keywords
ASD
Attention-Based Graph Convolutional Network
fMRI
Machine Learning
functional connectivity
Accurately predicting Downward Shortwave Radiation (DSWR) is important for renewable energy, agriculture, and environmental studies. Global datasets provide DSWR estimates at coarse resolutions but often lack the localized precision required for tasks like energy system planning. This study introduces NN-XGBoost, a novel method that combines nearest-neighbor smoothing with the predictive power of eXtreme Gradient Boosting (XGBoost) to enhance accuracy in downscaling and predicting DSWR.
The proposed model leverages global DSWR data from Open-Meteo and local observations from Ambient Weather. Two prediction strategies are examined: (1) using a single local variable and (2) using multiple local variables. Results show that NN-XGBoost consistently outperforms both XGBoost and ARIMAX, achieving lower error (RMSE) and higher accuracy (\(R^2\)). This method provides a practical and scalable approach to improving DSWR forecasting and has significant applications in renewable energy planning, environmental monitoring, and agricultural decision-making.
Keywords
Downward Shortwave Radiation
Nearest-Neighbor XGBoost
Forecast
Artificial intelligence and machine learning advancements have transformed decision-making landscapes in the financial industry. With the advent of more complex credit products, the need for robust and innovative credit risk management is more critical.
We harness behavioral scoring insights to develop a machine learning model for credit risk management, thus providing deeper borrower profiling. Data collection was done through focus group interviews and secondary sources. Behavioral data were analyzed to identify patterns, while financial data underwent preprocessing and feature engineering to ensure compatibility with machine learning algorithms.
Machine learning models, including logistic regression, support vector machines, K-nearest neighbors, decision trees, extreme gradient boosting, light gradient boosting, and CatBoost, were trained and evaluated for accuracy, precision, recall, and F1-score. The results demonstrated the effectiveness of ensemble methods, particularly CatBoost, which outperformed other models with an accuracy of 0.87, a precision of 0.88, a recall of 0.86, and an F1-score of 0.87.
Keywords
Behavioral scoring
Machine Learning
Credit Risk Assessment
CatBoost
Borrower profiling
Ensemble methods
In this work, we develop a new ensemble learning framework, multi-label Random Subspace Ensemble (mRaSE), for multi-label classification problems. Given a base classifier (e.g., multinomial logistic regression, classification tree, K-nearest neighbors), mRaSE works by first randomly sampling a collection of subspaces, then choosing the best ones that achieve the minimum cross-validation errors, and finally aggregating the chosen weak learners. In addition to its superior prediction performance, mRaSE also provides a model-free feature ranking depending on the given base classifier. An iterative version of mRaSE is also developed to further improve the performance. A model-free extension is pursued on the iterative version, leading to the so-called Super mRaSE, which accepts a collection of base classifiers as input to the algorithm. We show the proposed algorithms compared favorably with the state-of-the-art classification algorithm, including random forest and deep neural network, via extensive simulation studies and two real data applications. The new algorithms are implemented in an updated version of the R package RaSEn.
Keywords
multi-label classification
ensemble subspace
feature ranking
Small bowel obstruction (SBO) is common in ED, with operative management recommended within 24hrs (OM) for suspected ischemia or treatment failure, though 80% of cases improve with nonoperative care (NOC). We developed logistic regression (LR) and random forest (RF) models to predict NOC in patients admitted to ED. This secondary analysis uses data from a multicenter retrospective study of SBO patients diagnosed by CT at 10 EDs. 70% of data was used for training, 30% for testing, with stratification to maintain the case-control ratio, and both RF and LR models were weighted for class imbalance. We selected physical exam features, WBC count, creatinine, lactic acid, history of malignancy, and hernia from clinically/statistically significant features using stepwise regression. Of 1419 patients with history of SBO confirmed by CT imaging, 6%patients required OM. The AUROC for LR was 0.68 (95% CI: 0.56-0.79) vs. 0.56 (95% CI: 0.45-0.67) for RF, with no significant difference (P=0.176). The misclassification rate was 42% for LR vs. 43% for RF. Statistical models can help triage SBO patients in the ED, though misclassification persists; cutoff values for 95% sensitivity are provided.
Keywords
Logistic Regression
Random Forest
Decision Trees
Predictive Models
Small Bowel Obstruction
Non-Operative Management
Co-Author(s)
David Hoaglin
Jonathan Abelson, Lahey Hospital and Medical Center
Steven Stain, Lahey Hospital and Medical Center
Hamid Shokoohi, Harvard Medical School
Nicole Duggan, Brigham and Women’s Hospital
Charles Brower, University of Cincinnati Medical Center
Caroline Schissel, Lahey
Andrew Goldsmith, Lahey Hospital and Medical Center
David Stein, Mass General Brigham
First Author
Tasneem Zaihra Rizvi, Lahey Hospital and Medical Center
Presenting Author
Tasneem Zaihra Rizvi, Lahey Hospital and Medical Center