Tuesday, Aug 5: 8:30 AM - 10:20 AM
4092
Contributed Papers
Music City Center
Room: CC-106A
Main Sponsor
International Chinese Statistical Association
Presentations
Customer feedback particularly plays an important role for tourism, affecting travelers' decisions when choosing accommodations. Researcher in this tourism industry tends to choose sentiment analysis techniques such as Naïve Bayes Approach, Latent Dirichlet Allocation (LDA), Structural Topic Model (STM) to help host find the pivotal factors contributing to the reputation of their property. However, as the existence of sarcasm, implicit sentiment and contextual ambiguity always, it makes the classification inaccurately. To address these limitations, we propose leveraging Large Language Models (LLMs) to rephrase customer reviews before utilizing sentiment analysis methods. After compared to the methods without the rephrasing process of feedback by using LLMs, it proves that the hybrid methodology incorporating LLMs significantly enhance the performance of LDA and STM, providing a more accurate and reliable classification and interpretation of customer reviews.
Keywords
Sentiment Analysis
Large Language Model
LDA
STM
Co-Author(s)
Gaoya Tu, North Dakota State University Main Campus
Bong-Jin Choi, North Dakota State University
First Author
Jing Bai, North Dakota State University Main Campus
Presenting Author
Jing Bai, North Dakota State University Main Campus
As Artificial Intelligence (AI) becomes more ubiquitous, it makes sense to think about what exactly we mean by artificial intelligence. As Statistics is generally considered the science of data, it makes sense that we think of statistical thinking in AI. However, the conundrum is that there are not as many papers looking at statistical thinking in AI, more in the deployment of AI to data problems. A seminal paper by Yu and Kumbier (2017) introduced the P-Q-R-S framework-Population, Question, Representation, and Scrutiny-as essential components in deploying AI systems. This presentation focuses on the S component, exploring neutral zones as a method for managing ambiguity in classifications. Specifically, we examine the framework proposed by Jeske and Smith (2018), which introduced neutral zones for LDA and QDA, enabling control over FPR and FNR in ambiguous cases. Additionally, we will discuss the challenges of constructing neutral zones for non-model-based methods, such as KNN and ANN, where output is limited to class probabilities. Through this exploration, we aim to highlight the importance of statistical scrutiny in enhancing the reliability and interpretability of AI system.
Keywords
Neutral zones, classification
There has been significant attention to human disease network (HDN) analysis, which describes how diseases are interconnected. Compared to the gene-centric and phenotypic ones, clinical treatment-based HDNs can have more direct practical relevance. For common cancers, our goal is to conduct the HDN analysis for inpatient and outpatient treatments separately, which have significantly different clinical implications and data patterns. This effort can assist in better understanding not only individual cancers but also their commonalities and differences, which have been scarcely examined under the HDN framework.
We mine SEER-Medicare for subjects diagnosed from 2004 to 2017 with 10 common cancers. In the inpatient/outpatient treatment setting, a total of 113/168 diseases are analyzed. We develop a deep neural network (DNN)-based estimation approach, which adopts an additive two-part loss function to accommodate zero inflation, DNN to accommodate nonlinearity, and penalization to identify network edges. In the data analysis, sensible findings on interconnections and modular structures are made for individual cancers.
Keywords
clinical treatment outcomes
human disease networks
cancers
SEER-Medicare data
We leverage machine learning models to better understand Alzheimer's Disease by using cognitive scores as the dependent variable and other clinical variables as explanatory variables to uncover their influence on the disease's development. Specifically, we employ the Regression Transformer model, a specialized adaptation of the transformer architecture tailored for regression analysis and its self-attention mechanism to model dependencies across both temporal and feature dimensions as well as missing data. In addition, we couple with SHAP (Shapley Additive exPlanations) to tackle the interpretability issue with machine learning models. We evaluate our approach with previous work in machine learning on Alzheimer's disease progression such as the reinforcement learning as well as the traditional mixed-effects regression through simulation and by applying to the A4 trials data to gain additional insights.
Keywords
Machine learning
Regression Transformer
interpretability
SHAP (Shapley Additive exPlanations)
Alzheimer’s Disease
Missing Data
Advances in data storage and sensor technology have led to a surge in multidimensional functional datasets across fields like neuroimaging and climate science. A common analytic approach involves first mapping discrete observations into continuous functional representations, followed by statistical analysis on the smoothed functions. However, traditional one-dimensional (univariate) functional data analysis approaches struggle with the curse of dimensionality when extended to multidimensional domains. We propose a computational framework for learning continuous representations of multidimensional functional data that overcomes these challenges. Our method constructs representations using data-adaptive separable basis functions and efficiently estimates them via tensor decomposition of a transformed data structure. We further incorporate roughness-based regularization through differential operator-based penalties. In this presentation, we discuss key theoretical properties of our approach and provide extensive simulation studies showcasing its advantages over existing methods.
Keywords
multidimensional functional data analysis
tensor decomposition
basis representation
functional principal component analysis
brain image analysis