Machine Learning and Deep Learning: Frontier Methods and Applications

Frederick Kin Hing Phoa Chair
Academia Sinica
 
Tuesday, Aug 5: 8:30 AM - 10:20 AM
4092 
Contributed Papers 
Music City Center 
Room: CC-106A 

Main Sponsor

International Chinese Statistical Association

Presentations

Comparative Sentiment Analysis Integrating Large Language Models for Company Website Reviews

Customer feedback particularly plays an important role for tourism, affecting travelers' decisions when choosing accommodations. Researcher in this tourism industry tends to choose sentiment analysis techniques such as Naïve Bayes Approach, Latent Dirichlet Allocation (LDA), Structural Topic Model (STM) to help host find the pivotal factors contributing to the reputation of their property. However, as the existence of sarcasm, implicit sentiment and contextual ambiguity always, it makes the classification inaccurately. To address these limitations, we propose leveraging Large Language Models (LLMs) to rephrase customer reviews before utilizing sentiment analysis methods. After compared to the methods without the rephrasing process of feedback by using LLMs, it proves that the hybrid methodology incorporating LLMs significantly enhance the performance of LDA and STM, providing a more accurate and reliable classification and interpretation of customer reviews. 

Keywords

Sentiment Analysis

Large Language Model

LDA

STM 

Co-Author(s)

Gaoya Tu, North Dakota State University Main Campus
Bong-Jin Choi, North Dakota State University

First Author

Jing Bai, North Dakota State University Main Campus

Presenting Author

Jing Bai, North Dakota State University Main Campus

Scrutinizing AI Classification Performance Using Neutral Zones

As Artificial Intelligence (AI) becomes more ubiquitous, it makes sense to think about what exactly we mean by artificial intelligence. As Statistics is generally considered the science of data, it makes sense that we think of statistical thinking in AI. However, the conundrum is that there are not as many papers looking at statistical thinking in AI, more in the deployment of AI to data problems. A seminal paper by Yu and Kumbier (2017) introduced the P-Q-R-S framework-Population, Question, Representation, and Scrutiny-as essential components in deploying AI systems. This presentation focuses on the S component, exploring neutral zones as a method for managing ambiguity in classifications. Specifically, we examine the framework proposed by Jeske and Smith (2018), which introduced neutral zones for LDA and QDA, enabling control over FPR and FNR in ambiguous cases. Additionally, we will discuss the challenges of constructing neutral zones for non-model-based methods, such as KNN and ANN, where output is limited to class probabilities. Through this exploration, we aim to highlight the importance of statistical scrutiny in enhancing the reliability and interpretability of AI system. 

Keywords

Neutral zones, classification 

First Author

Mengqi Yin

Presenting Author

Mengqi Yin

Human Disease Network Analysis of Clinical Treatment Outcomes with Zero Inflation

There has been significant attention to human disease network (HDN) analysis, which describes how diseases are interconnected. Compared to the gene-centric and phenotypic ones, clinical treatment-based HDNs can have more direct practical relevance. For common cancers, our goal is to conduct the HDN analysis for inpatient and outpatient treatments separately, which have significantly different clinical implications and data patterns. This effort can assist in better understanding not only individual cancers but also their commonalities and differences, which have been scarcely examined under the HDN framework.
We mine SEER-Medicare for subjects diagnosed from 2004 to 2017 with 10 common cancers. In the inpatient/outpatient treatment setting, a total of 113/168 diseases are analyzed. We develop a deep neural network (DNN)-based estimation approach, which adopts an additive two-part loss function to accommodate zero inflation, DNN to accommodate nonlinearity, and penalization to identify network edges. In the data analysis, sensible findings on interconnections and modular structures are made for individual cancers. 

Keywords

clinical treatment outcomes

human disease networks

cancers

SEER-Medicare data 

First Author

Jiping Wang

Presenting Author

Jiping Wang

Leveraging Regression Transformers and SHAP for Alzheimer’s Research: A Path to Deeper Insights

We leverage machine learning models to better understand Alzheimer's Disease by using cognitive scores as the dependent variable and other clinical variables as explanatory variables to uncover their influence on the disease's development. Specifically, we employ the Regression Transformer model, a specialized adaptation of the transformer architecture tailored for regression analysis and its self-attention mechanism to model dependencies across both temporal and feature dimensions as well as missing data. In addition, we couple with SHAP (Shapley Additive exPlanations) to tackle the interpretability issue with machine learning models. We evaluate our approach with previous work in machine learning on Alzheimer's disease progression such as the reinforcement learning as well as the traditional mixed-effects regression through simulation and by applying to the A4 trials data to gain additional insights. 

Keywords

Machine learning

Regression Transformer

interpretability

SHAP (Shapley Additive exPlanations)

Alzheimer’s Disease

Missing Data 

Co-Author

Hongbing Zhang, University of Kentucky

First Author

Ruiyi Jiang, UNIVERSITY OF KENTUCKY

Presenting Author

Ruiyi Jiang, UNIVERSITY OF KENTUCKY

WITHDRAWN Multidimensional Functional Data Analysis Using Marginal Product Basis System

Advances in data storage and sensor technology have led to a surge in multidimensional functional datasets across fields like neuroimaging and climate science. A common analytic approach involves first mapping discrete observations into continuous functional representations, followed by statistical analysis on the smoothed functions. However, traditional one-dimensional (univariate) functional data analysis approaches struggle with the curse of dimensionality when extended to multidimensional domains. We propose a computational framework for learning continuous representations of multidimensional functional data that overcomes these challenges. Our method constructs representations using data-adaptive separable basis functions and efficiently estimates them via tensor decomposition of a transformed data structure. We further incorporate roughness-based regularization through differential operator-based penalties. In this presentation, we discuss key theoretical properties of our approach and provide extensive simulation studies showcasing its advantages over existing methods. 

Keywords

multidimensional functional data analysis

tensor decomposition

basis representation

functional principal component analysis

brain image analysis 

Co-Author(s)

Arun Venkataraman, PennMedicine
Xing Qiu

First Author

William Consagra, University of South Carolina