Print Close

CS009 Lightning Session 1, Part 1

Conference: Symposium on Data Science and Statistics (SDSS) 2026

04/29/2026: 1:15 PM - 2:45 PM CDT
Lightning

Room: Milwaukee A

Description

This session will be followed by an e-poster session on April 29 from 2:45 - 3:10 PM.

Chair

Sarah Kalicin, Achieve More With Data, LLC

Tracks

Data Science Applications

AI and LLM Applications

Symposium on Data Science and Statistics (SDSS) 2026

Presentations

A RAG-based Classification Tool for Automating and Standardising Competency Mapping in Public Service

1 Introduction

The Singapore public service is the largest employment in Singapore managing a diverse workforce (e.g., policy, medical, and military), utilizes a comprehensive competency framework to ensure a good fit for the roles. This framework comprises approximately 500 functional competencies with defined proficiency levels, serving as a common language for strategic HR planning. However, the scale and diversity of roles make manual mapping of jobs to this framework a cumbersome, time-consuming, and subjective exercise. This ambiguity leads to inconsistent data across agencies, hindering sector-wide talent analysis and strategic insights

2 Methods
To address these challenges, we have developed an automated classification tool. The system leverages a Retrieval-Augmented Generation (RAG) architecture where we vectorized the competency framework using embedding model (nomic-embed-text-v1.5) and stored in a Faiss index for high-speed retrieval. This retrieval system is integrated with a large language model (LLM), which analyzes a given job title and description. The tool retrieves the most relevant competencies and then prompts the LLM to recommend the appropriate competency and proficiency level.

3 Data/Results
The model, iteratively developed and validated with HR leaders, achieved 90% accuracy in alignment with human expert validation. This innovation has resulted in a 30% reduction in the time required for manual tagging and assessment. More significantly, it ensures the generation of uniform, standardized data, enabling robust, consolidated insights for strategic workforce planning across the entire public sector.

Presenting Author

Lian Ping Ler, Ministry of Manpower

First Author

Zhihan Chen

AI Agents to Enhance Difficult Conversations Among Nursing Students

Effective communication is a critical nursing competency, yet students often experience anxiety and have limited opportunities for practice before clinical encounters. Traditional teaching methods and standardized patient interactions provide constrained, single-instance practice and feedback, hindering both confidence and the transferability of skills. The findings from this study indicate that integrating an AI chatbot into nursing education can serve as an effective preparatory tool to enhance students' confidence, readiness, and communication skills for navigating challenging clinical conversations. Although the chatbot experience had its limitations, it provided most students with a valuable opportunity to practice, reflect, and apply feedback before engaging in live simulations. The combination of positive Likert scale results and qualitative reflections underscores the chatbot's role in reducing anxiety, promoting skill development, and offering structured, accessible practice in a low-stakes environment. As nursing education continues to explore innovative technologies, retrieval-augmented generation (RAG) driven simulation tools have the potential to complement traditional training methods, bridge the gap between theory and practice, and better prepare future nurses for the complexities of real-world patient care. Further research should explore strategies to enhance emotional engagement in AI simulations, examine the long-term impacts on communication competency, and improve chatbot adaptability and realism.

Presenting Author

Hiba Armaghan

First Author

Hiba Armaghan

CoAuthor(s)

Rachel Malander, Creighton University
Lindsay Iverson, Creighton University
Tamara Oliver, Creighton University
Amanda Kirkpatrick, Creighton University
Steven Fernandes, Creighton University

AI Enhanced Clinical Pharmacology: Building Smarter Prescribers Through Interactive, Lifespan Centered Learning

In this project on AI Powered Clinical Pharmacology: Enhancing Lifespan Based Prescribing Through Interactive Learning, I intend to develop and implement three 24/7 retrieval augmented generation (RAG) based conversational chatbots to support nurse practitioner students in mastering antimicrobial pharmacology, one of the most challenging and high stakes areas of their advanced pharmacology course. This innovative approach aligns with national priorities in nursing education and antimicrobial stewardship and has the potential to scale across other complex content areas in healthcare education. My RAG model will take the text, chunk it into parts, convert it into embeddings, create a vector database, and perform cosine similarity matching.

Presenting Author

Erika Germinario

First Author

Erika Germinario

CoAuthor(s)

Ashlyn Viereck, Creighton University
Seamus Gerner
Lindsay Iverson, Creighton University
Steven Fernandes, Creighton University

Application of Generative AI to Suicidal Ideation on Social Media

Background
Computational approaches to suicide risk detection using social media have grown rapidly in recent years. However, much of existing literature relies on oversimplified binary classification frameworks. Additionally, many approaches depend on machine learning models, which struggle to capture semantic nuance in short, informal social media text.

Objective
This study applies generative AI to (1) develop a multi-class framework that distinguishes genuine suicidal ideation from context-absent suicidal statements and exaggerated expressions on Twitter, and (2) compare longitudinal risk profiles across these categories.

Methods
Using a Twitter dataset from 2016 to 2020, approximately 47 million tweets were screened using suicidal keywords, resulting in 3,807 candidate tweets for analysis. The ChatGPT model was applied to classify tweets into 4 categories: genuine suicidal ideation, context-absent suicidal statements, exaggerated expressions, and no suicidal ideation. Model performance was validated against human annotation. Then, up to 6 months of historical tweets preceding the index tweet, comprising of 417,000 tweets, were analyzed using a two-stage ChatGPT-based pipeline to identify suicide risk signals. The OpenAI batch API was used to perform large-scale classification. Group differences in proportional risk-factor burden were assessed using non-parametric Kruskal-Wallis tests.

Results
The AI classifier achieved a weighted F1-score of 0.95. Users expressing genuine suicidal ideation exhibited significantly higher levels of depression, hopelessness, diagnosed psychological disorders, prior suicidal ideation, and self-harm intent compared with users engaging in exaggerated statements. Several risk factors, such as loneliness and negative self-concept, were observed at similar levels across the groups.

Conclusion
These findings demonstrate that the AI-driven framework produces internally coherent groupings that align with established psychosocial risk profiles associated with suicidal ideation.

Presenting Author

Sophia Yuan, Parkview High School

First Author

Sophia Yuan, Parkview High School

Evaluating Single-Agent and Multi-Agent AI Architectures for Analytical Modeling

Iterative refinement is central to analytical modeling, where robust outcomes emerge through repeated evaluation, refinement, and validation. As Artificial Intelligence (AI) Agents are increasingly used in analytical workflows, a key architectural question arises: can a single analyst agent manage modeling complexity, or do multi-agent frameworks with explicit role separation justify their higher computational cost?

This paper presents an empirical comparison between (a) a single-agent framework, in which one analyst agent performs data exploration, modeling, evaluation, and revision within a unified workflow, and (b) a multi-agent framework that separates responsibilities across analyst, critic, and refiner agents. We evaluate these architectures across descriptive and predictive analytical tasks.

For well-specified descriptive analytics tasks such as summary statistics, deterministic transformations, the single-agent framework achieves analytical quality on par to multi-agent designs. Single-agent framework requires fewer model invocations and lower token usage, making multi-agent coordination overhead unnecessary. In contrast, for analytical tasks that often involve ambiguity, competing objectives, and subtle modeling failure modes such as data leakage, spurious correlations, and non-linear dependencies, we show that multi-agent workflows more reliably detect these issues, reduce false confidence, and improve generalization at increased computational cost.

We interpret these findings through an analogy to established human and organizational practices, where independent critique complements core implementation to improve robustness. Overall, our results suggest that multi-agent architectures primarily enhance reliability under uncertainty rather than routine analytical accuracy. We conclude that single-agent workflows are preferable for low-risk, well-defined tasks, while multi-agent designs are justified for high-stakes modeling tasks where systematic error detection is critical.

Presenting Author

Praveen Gupta Sanka

First Author

Praveen Gupta Sanka

CoAuthor

Vidya Minukuri, Convergence Inc

Improving Road Intersection Classification Using Latent Network Structure

Accurate classification of road network intersections is crucial for effective urban planning and ensuring traffic safety. Traditional machine learning approaches often ignore the topological structure of road networks, treating intersections as independent entities. This study evaluates the effectiveness of augmenting original node features with latent graph representations derived from the Generalized Random Dot Product Graph (GRDPG) model and conducts a comparative analysis across multiple classes of machine learning and deep learning models. Unlike methods that rely on homophily assumptions, GRDPG naturally accommodates heterophilous connectivity patterns, which are common in transportation networks. Experiments on a real-world urban road network demonstrate substantial performance gains, indicating that incorporating latent structural context is critical for accurate intersection classification. These results position GRDPG-based embeddings as a computationally efficient alternative to more complex graph neural network architectures.

Presenting Author

Ramchandra Rimal, Middle Tennessee State University

First Author

Ramchandra Rimal, Middle Tennessee State University

CoAuthor

Abigail Kelly, Middle Tennessee State University

Longitudinal Data-Driven Prediction of Type 2 Diabetes

While substantial research has applied machine learning (ML) methods to cross-sectional datasets, comparatively limited work has evaluated ML approaches within longitudinal study designs. Accurate prediction of type 2 diabetes mellitus (T2DM) using longitudinal data is critical for early detection, risk stratification, and targeted prevention. This study evaluates and compares traditional longitudinal statistical models and ML approaches for predicting incident T2DM using nurse visit data from waves 2, 4, and 6 of the English Longitudinal Study of Ageing (ELSA). The analysis included 8,368 repeated observations from adults aged 50 years and older, with diabetes status assessed over time. Traditional models, including Generalized Linear Mixed Models (GLMM) and Generalized Estimating Equations (GEE), were compared with machine learning (ML) methods, including Random Forest (RF), Mixed-Effects Random Forest (MERF), and Extreme Gradient Boosting (XGBoost). Models were trained and evaluated under consistent subject-level data splits to preserve the longitudinal structure. Predictive performance was assessed using discrimination, classification, and calibration metrics, including AUROC, PR-AUC, sensitivity, specificity, precision, F1-score, log loss, and Brier score. The comparative analysis showed that GEE achieved the strongest overall discrimination and calibration on the test set (AUROC = 0.8289; Brier = 0.0786), closely followed by RF. XGBoost and MERF demonstrated moderate discrimination, while GLMM showed comparatively lower AUROC. At the default 0.5 threshold, all models exhibited high specificity but reduced sensitivity due to class imbalance. After threshold optimization, performance improved substantially across models, with RF and GEE achieving the highest F1-scores and balanced sensitivity–specificity trade-offs. These findings emphasize the importance of threshold tuning and appropriate longitudinal modeling strategies when predicting chronic disease risk using repeated measures data.

Presenting Author

Peggy Akabuah

First Author

Peggy Akabuah

CoAuthor

Kristina Vatcheva, University of Texas Rio Grande Valley

Machine Learning Approaches to Predict Sepsis Risk in Multiple Myeloma Patients Treated with Bispecific Antibodies

Treatment of multiple myeloma (MM) with bispecific antibodies (bsAbs) results in an increased risk of infection, including infection related mortality; predicting this risk is a significant unmet need. We aimed to develop machine learning models to predict risk of infection in MM patients receiving bsAb therapy.

Clinical data was retrospectively collected in a multi-institutional cohort study (n=353), enrolling patients treated with at least one full dose of teclistamab or talquetamab. Using Python, AutoGluon-Tabular, and PyTorch, a range of machine learning approaches were developed considering infection and severe infection (CTCAE Grade ≥3) as binary problems. To avoid overfitting and address imbalanced data, we used k-fold bagging, automatic sample weighting, and out-of-fold predictions. Feature importance was assessed using SHapley Additive exPlanations.

We included a total of 353 patients - 195 (55%) were male, 258 (73%) were Caucasian, 69 (20%) were African American, 275 patients (78%) had high-risk disease at diagnosis, and 327 patients (93%) had triple-class refractory disease. A majority (81%, n=287) underwent prior autologous stem cell transplantation (ASCT). Patients treated with teclistamab were more likely to develop infection within 365 days as compared to talquetamab (n=98/201 [48.8%] vs. n=41/152 [27.0%], p=0.0002). Neural networks identified patients at risk of developing severe infection (grade ≥3) within 365 days of bsAb therapy with ROC/AUC of 0.78; LightGBMs identified patients at risk of developing severe infection within 90 days of bsAb therapy with ROC/AUC of 0.86; Stacked ensemble models identified patients at risk of developing severe infection within 90 days of bsAb therapy with ROC/AUC of 0.88 at risk of overfitting given cohort size. Cumulative dose, lymphocyte count, and number of prior ASCT had the largest impact on risk prediction.

To our knowledge, this is the first ML model that predicts infection risk within 90/365 days of initiating bsAbs for MM. Future research will focus on validating these findings in larger cohorts, and evaluating newer techniques.

Presenting Author

Anand George

First Author

Nicholas Semenkovich, Medical College of Wisconsin DSI

CoAuthor(s)

Anand George
Aishee Bag, Rutgers Cancer Institute, New Jersey
Mansi Shah, Rutgers Cancer Institute, New Jersey,
Sabarinath Radhakrishnan, Medical College of Wisconsin
Binod Dhakal, Medical College of Wisconsin
Samer Al Hadidi, University of Arkansas for Medical Sciences, Little Rock,
Rajshekhar Chakraborty, Herbert Irving Comprehensive Cancer Center, New York,
Carolina Schinke, University of Arkansas for Medical Sciences, Little Rock,
Anita D’Souza, Medical College of Wisconsin
Aniko Szabo, Medical College of Wisconsin
Meera Mohan, Medical College of Wisconsin

Measuring the Unseen: A Statistical Approach to Quantifying Narrative Bias in Historical Accounts

An objective view of history is often treated as an ideal, yet historical narratives are inevitably shaped by the perspectives of their authors. I use word embeddings to examine the political leanings present in different historical accounts. I outline a framework for estimation and quantifying uncertainty. This project proposes a data-driven approach to representing historical narratives in a shared analytical space, enabling comparison across differing interpretive lenses.

Presenting Author

Annie Nguyen

First Author

Annie Nguyen

CoAuthor

Jonathan Auerbach, George Mason University

Modeling Counts of Census Responses over a Span of Time

For a survey or census, it may be of interest to understand the nature in which responses arrive over time during the operation. Tendencies may vary with subsets of the population - such as those sharing geographic regions or non-geographic characteristics - in ways that can be interesting or informative in planning subsequent operations. This work considers the application of panel count models to daily counts of responses from the 2020 Decennial Census. These data are publicly available at several levels of geography. Panel count models are more commonly used in biostatistics to study counts of recurrent events which occur within a subject between observation times. The framework will be applied to the present setting, with special considerations to account for some of the interesting features of the data.

Presenting Author

Andrew Raim, U.S. Census Bureau

First Author

Andrew Raim, U.S. Census Bureau

On-the-Fly versus Pre-Saved Data Augmentation for Deep Learning–Based Ophthalmic Image Classification

Deep learning has demonstrated strong potential for automated ophthalmic image classification; however, its effectiveness is highly dependent on the availability and diversity of training data. In the absence of augmentation, baseline models trained on limited or imbalanced datasets often exhibit restricted generalization and unstable performance. Data augmentation is therefore widely employed to mitigate data scarcity, yet the effect of augmentation implementation strategy-on-the-fly (real-time) versus pre-saved (offline)-remains insufficiently quantified in ophthalmic imaging.

This study systematically compares on-the-fly and pre-saved augmentation for deep learning–based classification of ophthalmic images using MobileNetV2. Experiments were conducted on two modalities: Optical Coherence Tomography (OCT) and retinal fundus images. Five clinically plausible transformations-rotation (±10°), translation (±10%), combined rotation–translation, Gaussian noise (σ = 0.05), and horizontal flipping-were evaluated under both strategies and against a no-augmentation baseline.

On the OCT dataset, baseline training achieved only 55% accuracy (macro F1 = 0.53). On-the-fly augmentation did not provide consistent improvements, yielding accuracies between 45% and 57%. In contrast, pre-saved augmentation produced substantial gains, with accuracies of 86% (rotation), 92% (translation), 91% (rotation–translation), 89% (Gaussian noise), and 95% (horizontal flip). For fundus images, baseline performance was higher (89% accuracy). On-the-fly augmentation further improved results, reaching up to 89% accuracy with horizontal flipping and 87% with Gaussian noise. Pre-saved augmentation yielded more modest but stable improvements, with accuracies ranging from 75% to 81%.

Overall, augmentation effectiveness is strongly dataset-dependent: pre-saved strategies favor small or imbalanced datasets, while on-the-fly augmentation benefits large, diverse collections. Across all experiments, horizontal flipping emerged as the most robust and consistently beneficial transformation.

Presenting Author

Gifty Duah, University of Texas Rio Grande Valley

First Author

Gifty Duah, University of Texas Rio Grande Valley

CoAuthor(s)

Eric Nyarko, University of Ghana
Justice Effah, University of Texas Rio Grande Valley
Isaac Numoah, Old Dominion University

Reproducible Extraction of Drug–Event Latency From Pharmacovigilance Narratives Using Large Language Models and Statistical Monitoring

Accurate estimation of the time elapsed from exposure of a pharmaceutical product to the occurrence of an adverse event (i.e. Latency) is essential information for pharmacovigilance decision-making. In practice, latency information is often embedded within unstructured Individual Case Safety Report (ICSR) narratives and must be manually extracted, a process that is time-consuming, error-prone, and difficult to scale. We describe a hybrid statistical and generative AI framework designed to automate the extraction of dose dates, event onset dates, and supporting quotations from individual case narratives. The framework computes latency as either point estimates or bounded intervals, depending on the level of date precision available. The system leverages a large language model (OpenAI o3) to identify fully and partially specified dates and to extract explicit latency statements. Precise dates allow for direct calculation of latency and partial dates allow for the creation of bounded latency ranges. When latency is expressed narratively, targeted parsing methods derive corresponding numeric estimates. To mitigate LLM output variability, each extraction is repeated ten times, and results are aggregated using a modal consensus rule. Aggregation produces most likely (i.e. mode) final dates or intervals alongside a reproducibility score, defined as the proportion of runs supporting the modal outcome. The pipeline was evaluated on two hundred seventy-three real-world ICSRs with reviewer-annotated gold standards. For high-reproducibility cases (~90% reproducibility), the approach achieved ~89% agreement for first-dose latency and ~78% agreement for recent-dose latency, while reducing reviewer latency-extraction time by approximately 50%. Reproducibility demonstrated strong correlation with accuracy and served as a practical confidence indicator for prioritizing human review. These findings demonstrate that reproducible generative AI pipelines, integrated with statistical aggregation and sampling-based quality assurance, can reliably accelerate latency extraction from ICSRs while preserving transparency and human oversight. The provision of having accuracy and reproducibility estimates for the analyzed cases supports continuous oversight of the model's performance by an end-user in the loop. This approach enables more scalable, timely, and defensible pharmacovigilance workflows.

Presenting Author

Swarnita Chakraborty, Johnson & Johnson

First Author

Swarnita Chakraborty, Johnson & Johnson

CoAuthor(s)

Geoffrey Gipson, Johnson & Johnson
Yauheniya Cherkas, Johnson & Johnson
Ricardo Vale de Andrade, Johnson & Johnson
Joao Barbosa, Johnson & Johnson
Zainab Aziz Zaveri, Johnson & Johnson
Hien Bui, Johnson & Johnson
Mark Oliver Amponin, Johnson & Johnson

Quantifying Intersectional Bias in Clinical Foundation Models: A Counterfactual Digital Twin Audit

As General Purpose Artificial Intelligence (GPAI) becomes a de facto "learned intermediary" in clinical workflows, traditional single-axis audits fail to capture "Intersectional Toxicity"-where racial and socioeconomic biases synergistically amplify. This study evaluates "Prognostic Fatalism" in GPAI, examining how foundation models may automate structural racism by conflating social risk with biological determinism. We utilized a Counterfactual Digital Twin audit protocol, derived from guideline-compliant case reports of oropharyngeal cancer, to isolate causal inference. By holding biological ground truth constant while systematically permuting Race (White/Black) and Insurance Status (Private/Medicaid) across identical patient vignettes, we isolated the model's internal reasoning architecture from clinical evidence. We performed a stability analysis (n=10) of Gemini 3.0 Pro's 5-year overall survival (OS) estimates for complex, high-stakes scenarios.To provide a "white-box" view of decision-making, we developed the Reasoning Attention Index (RAI), a forensic linguistic metric quantifying "Semantic Drift"-the model's shift from clinical evidence (pathological tokens) to social profiling (sociodemographic tokens). Compounded marginalization (Black/Medicaid) triggered a statistically significant survival "crash" from 65.0% to 54.3% (p=0.002), with a 40% "fatalistic failure rate" for potentially curable conditions. The RAI increased ten-fold for marginalized profiles (0.38 vs. 0.04, p<0.001), indicating a massive diversion of computational attention toward social factors rather than biological markers. Qualitative analysis revealed "Staging Drift" in 60% of iterations, where the AI incorrectly applied terminal staging logic to favorable biology based solely on social markers. These findings suggest that current FDA and NIST risk frameworks require a paradigm shift toward mandating intersectional stress-testing and reasoning-state audits. Such oversight is essential to prevent a "Digital Nocebo Effect" and ensure clinical AI compliance

Presenting Author

Lei Guo, VA St. Louis HealthCare System

First Author

Lei Guo, VA St. Louis HealthCare System

CoAuthor

Shuimei Liu, China University of Political Law and Science

AI as a Teaching Tool for Tonsillolith Detection

Developing diagnostic accuracy is a core goal of dental education, yet students often struggle to identify tonsilloliths on panoramic radiographs. Their limited clinical experience, combined with traditional teaching methods that rely heavily on manual interpretation, frequently results in diagnostic errors and low confidence.

This project aims to improve student learning by incorporating AI tools into radiology training. We will compare how accurately and efficiently dental students detect tonsilloliths with and without AI support, while also gathering their perceptions of AI as a learning resource. Diagnostic performance will be measured quantitatively, and qualitative feedback will reveal how AI guidance affects understanding, confidence, and engagement. The central question is whether AI can help close the gap between novice and expert performance in radiographic interpretation.

The findings will inform curriculum development in dental radiology and support teaching approaches that prepare students to use AI confidently and competently in clinical practice.

Presenting Author

Seamus Gerner

First Author

Seamus Gerner

CoAuthor(s)

Danielle Carroll, Creighton University
Niranzena Panneer Selvam, Instructor
Steven Fernandes, Creighton University