Applications of AI and Machine Learning in Science and Business

Ramchandra Rimal Chair
Middle Tennessee State University
 
Tuesday, Aug 5: 2:00 PM - 3:50 PM
4119 
Contributed Papers 
Music City Center 
Room: CC-104E 

Main Sponsor

Section on Statistical Learning and Data Science

Presentations

Bayesian Machine Learning for Corn Yield Prediction Using Satellite Imagery and Topographic Data

In an era of climate change and growing global food demand, accurate crop yield prediction is pivotal for leveraging advanced technologies to enhance crop management and sustainability. This study compares the prediction performance of several Bayesian Machine Learning method using high-resolution PlanetScope imagery and topographic data. In specific, the Bayesian Linear Regression, Bayesian Random Forest, Bayesian Splines, Bayesian Additive Regression Trees, and Bayesian Neural Network were developed to incorporate uncertainty quantification and achieve enhanced predictive accuracy. Our finding shows that the Bayesian Random Forest outperform the other model in term of crop yield prediction. 

Keywords

Bayesian Machine Learning

Topographical Data 

Co-Author

Hossein Moradi Rekabdarkolaee, South Dakota State University

First Author

Etornam Kunu

Presenting Author

Etornam Kunu

Data-Driven Portfolio Construction: Machine Learning Applications for Investment Optimization

Optimizing investment portfolios is a long-standing challenge in finance, requiring a balance between maximizing returns and minimizing risk. The integration of traditional portfolio theory with advancements in machine learning has made it increasingly feasible to construct optimal portfolios. This study leverages machine learning techniques to enhance portfolio optimization across all sectors of the S&P 500. We incorporate historical returns, risk factors, dividends, price-to-earnings (PE) ratios, debt-to-equity ratios, and recommendation scores from S&P 500 constituents to identify the best stocks from each sector. Deep learning models are then trained to predict future returns for individual stocks. These predictions are used for portfolio construction, employing modern portfolio theory principles and advanced optimization techniques such as mean-variance optimization and the three-factor asset pricing model. The performance of machine learning-driven portfolios is evaluated against traditional benchmarks using metrics such as the Sharpe ratio, Sortino ratio, and maximum drawdown. 

Keywords

Investment Portfolio

Machine Learning

Optimization 

Co-Author(s)

Jayanta Pokharel
Netra Khanal, University of Tampa

First Author

Binod Rimal, The University of Tampa

Presenting Author

Binod Rimal, The University of Tampa

Enhancing the Validity of Online A/B Tests with Divergent Units

In online A/B experiments, aligning the diversion unit with the analysis unit is crucial for unbiased and interpretable results. However, practical constraints frequently force a divergence-for example, when business metrics are granular, but operational realities or the nature of the experiments themselves, necessitate diversion at a higher level. This misalignment introduces hierarchical correlations and jeopardizes the statistical validity of experimental outcomes. This research presents a suite of innovative solutions, widely adopted and proven effective within Google Cloud, to address these complexities. Through rigorous simulations and real-world case studies, we demonstrate how these approaches reduce bias, improve statistical power, and deliver actionable insights from A/B experiments with divergent units. Our findings offer practical guidance for experimenters facing these challenges, ensuring business-critical decisions are based on statistically sound evidence. 

Keywords

Online experiments

A/B experiment

Google

Experiment design

Bias reduction

Hierarchical data 

Co-Author

Xueqi Zhao

First Author

Tianhong He

Presenting Author

Tianhong He

From Bias to Balance: A Data-Driven Approach to Fair Recruitment Practices

Algorithmic recruiting bias remains a problem, especially when it comes to gender, education, and job category. Measuring bias and creating mitigation methods are crucial as machine learning models increasingly shape hiring. Three datasets are used in this work to study bias at the algorithmic and data levels: COMPAS, Job Salary, and Adult Income. By evaluating demographic representation and its effect on salary and hiring scores, we investigate measurement bias and distribution imbalances at the data level. Disparities are found using correlation analysis, t-tests, and ANOVA. We assess whether ML models predict outcomes differently for various groups at the algorithmic level. While fairness-aware models, such as reweighting, adversarial debiasing, and equalized odds post-processing, help reduce bias while maintaining predictive accuracy, random forest and logistic regression act as baselines. According to preliminary findings, demographic characteristics have an impact on recruitment outcomes, which calls for more research. The trade-off between fairness and accuracy is also examined in this study. Biases in user interactions should be investigated in future research. 

Keywords

Algorithmic Bias

Fairness in Hiring

Machine Learning

Bias Mitigation

Fairness Constraints 

Co-Author

Shiyuan Wang, Department of Management, Central Michigan University

First Author

Hairu Fan, Department of Statistics, Actuarial and Data Sciences, Central Michigan University

Presenting Author

Hairu Fan, Department of Statistics, Actuarial and Data Sciences, Central Michigan University

Modeling Language Process as Hierarchical Adaptive Random Distributions

Modeling the process of language is particularly challenging due to its complex nature and the continuous changes required to ensure its relevance. The process of constructing words from a set of characters or sentences from a set of words, among others, is intrinsically hierarchical and conditional. Consequently, we conceptualize the design as recursive, where each level of the process depends on the outcomes of the lower level, and so on. At each level, the process can be seen as a random experiment which involves sequentially selecting one item at a time until a terminal item is reached, where each item is itself a random experiment to be constructed at the lower level. Given that each experiment is inherently conditional, we primarily model this process in discrete intervals. Then at each trial of the experiment, we characterize the selection probability on observed outcomes from previous trials. This enables us to analyze when a specific item is most likely to occur and how the selection probability evolves over successive trials. The cost of the estimation step is reduced through sampling. Finally, the estimation of the variance of the parameter estimate is provided. 

Keywords

Artificial neuron network

Cost reduction

Multinomial distribution

Variance estimation 

First Author

Abdellatif Demnati, Independent Researcher

Presenting Author

Abdellatif Demnati, Independent Researcher

Predicting Quality of a Survey Item from the Question Text

The Survey Quality Predictor (SQP) predicts the quality of survey questions based on 72 question characteristics (e.g. domain, nouns word count, answer scale, length of question). The question characteristics are manually coded. We evaluate whether it is possible to predict the quality of a survey question directly from the natural language text rather than from the 72 survey characteristics. We found that a language model can predict survey item quality directly from the question/answer options text and do so as good as the random forest model based on the 72 manually coded characteristics.
Specifically, we fine-tuned xlm-RoBERTa, a multilingual transformer-based model trained on multiple text corpora in over 100 languages, on our SQP dataset. The current web interface of the survey quality predictor (https://sqp.gesis.org) asks users to manually input the 72 features that users must code themselves based on a coding manual. Our work shows that the current implementation can be replaced with a much more user friendly web interface: the users simply enter the question text (and answer choices), and our natural language model predicts the question quality. 

Keywords

Survey Quality

Language Model

Natural Language Processing

Transformer Model

Random Forest

Deep Learning 

Co-Author(s)

Matthias Schonlau, University of Waterloo
Lydia Repke, GESIS – Leibniz Institute for the Social Sciences
Barbara Felderer, GESIS – Leibniz Institute for the Social Sciences

First Author

Tiancheng Yang, University of Waterloo

Presenting Author

Tiancheng Yang, University of Waterloo

Safety Signal Detection Using AI- From Strategy to Implementation

Our novel drug development method addresses patient adverse reactions using clinical trial data to explore undetected safety signals. Unlike other models, our technique confirms signals through expert opinion, literature search, and trial exploration, enhancing accuracy and scalability.

Our AI system employs an APRIORI (unsupervised) model in association rule mining, identifying associations between events using support, confidence, lift, Fisher exact test, and correlation.

The model requires high-quality patient-level adverse event data, minimizing bias and balancing demographics. Compliance with HIPAA, anonymization, secure storage, encryption, and ethical standards are crucial.

Pooled breast cancer study data will be used to discover and cross-check signals across therapeutic areas, ensuring algorithm reliability. This tool benefits the pharmaceutical industry by enabling early signal detection, reducing safety physicians' workload, and improving data integrity.

Implementing this safety signal detection system has regulatory implications, including stricter reporting, audits, streamlined processes, and timely safety assessment and enhances pharmacovigilance process. 

Keywords

Safety Signal

Pharmacovigilance

ChatGPT

Association Rule Mining

Generative AI

Artificial Intelligence 

Co-Author(s)

Abhijit Bapat, Medivant Pharma, LLC
Ashok Srivastava, Trans Atlantic Therapeutics

First Author

Jagannath Ghosh, Medivant Pharma, LLC

Presenting Author

Jagannath Ghosh, Medivant Pharma, LLC