Contributed Poster Presentations: Business Analytics/Statistics Education Interest Group

Shirin Golchi Chair
McGill University
 
Monday, Aug 4: 2:00 PM - 3:50 PM
4067 
Contributed Posters 
Music City Center 
Room: CC-Hall B 

Main Sponsor

Business Analytics/Statistics Education Interest Group

Presentations

64: Ensuring Model Performance Reliability through a Data-Centric Approach

Businesses optimize ML models for marginal performance gains, but how often are the business decisions made with full awareness of data quality?

The importance and the level of effort to maintain data quality is not new. However, the industry still lacks a standard way to quantify and monitor data quality. While companies rigorously optimize the models, data issues can quietly undermine performance, introduce bias and can lead to costly mistakes. For example, a leading credit agency's data errors like misreported number of inquiries and tradeline age led to significant financial losses.

This study introduces the Data Reliability Score (DRS), a longitudinal metric for assessing data quality across training and inference. Similar to performance metrics such as Accuracy, and Mean Squared Error, DRS provides continuous monitoring across six key pillars: Lineage, Completeness, Consistency, Bias, Frequency, and Accuracy rooted in Statistical methodologies.

By proactively identifying issues, DRS helps businesses ensure data reliability, preventing failures. Just as low-performing models aren't deployed, data with a low DRS should not be trusted for making business decisions. 

Keywords

Data Reliability

Model Performance

Business Decisions Trustworthiness

Preventing Data Decision Failures

Data Centric AI 

Co-Author

Vidya Minukuri, Convergence Inc

First Author

Praveen Gupta Sanka

Presenting Author

Praveen Gupta Sanka

65: Light-GBM-based Multiple Imputation Technique in Meta-regression to Handle Missing Data.

Meta-analysts within the social sciences face challenges when encountering missing covariates in meta-regression that can skew statistical inferences. In this study, we investigated the effectiveness of Light Gradient Boosted Methods (Light-GBM) for handling missing data, juxtaposed against standard multiple imputation methods, such as Predictive Mean Matching (PMM). Through a simulation study, we assessed the performance of these methods by measuring bias and precision in scenarios with varying degrees of missingness (5%, 15%, and 30%) and different missing data mechanisms (MCAR, MAR, and MNAR). The findings revealed that while multiple imputation methods could provide accurate estimates in meta-regression, their efficacy varies with higher rates of missingness. LightGBM has shown consistent performance, minimal bias, and stable error ratios across all missing data scenarios, making it practical for multilevel meta-regression. Applying these machine learning techniques in meta-analysis marks a significant methodological advancement, offering a more robust framework for researchers confronting statistical challenges in systematic reviews. 

Keywords

Machine-Learning

Missing Data

LightGBM

Meta Analysis

Meta-Regression 

Co-Author

Sushil Sharma, At&t Labs

First Author

KAMAL CHAWLA, University of Maine

Presenting Author

KAMAL CHAWLA, University of Maine