Sunday, Aug 3: 2:00 PM - 3:50 PM
4014
Contributed Speed
Music City Center
Room: CC-104A
Poster presentations for this session will be on display in the JSM Expo Sunday, August 3, 4:00 - 4:45 p.m.
Presentations
Recent efforts across the federal statistical system aim to produce more accurate population estimates that incorporate international migration. Recent research by the U.S. Census Bureau relies on new administrative data to measure international migration into the U.S.; however, such data are often unavailable for subnational geographies, such as states. In the research, we leverage administrative data on inbound flights from the Bureau of Transportation Statistics, travel visa issuance from the Bureau of Consular Affairs, and advanced airline passenger statistics from U.S. Customs and Border Protection to produce novel monthly, statewide estimates and forecasts of immigrant admissions to the U.S. Our methodology utilizes structural time series models that directly model the trend and seasonal patterns of migration. These new estimates permit more accurate and timely estimates of migration by state and can be easily incorporated into standard demographic models.
Keywords
Immigration
Demographic estimates
Airline traffic
State space modeling
Forecasting
Flying can be stressful — but some airports make the experience a lot better than others. In this project, we set out to predict customer satisfaction scores (based on J.D. Power rankings) for major U.S. airports using a mix of airport operations data and local economic factors.
We gathered information on how many passengers airports serve, how often flights are delayed (both outbound and inbound), how often baggage gets lost, the average airfare, the local GDP, and even the region's average annual temperature. Using a blend of statistical modeling and machine learning tools, we explored how these factors connect to how travelers rate their airport experience. Additionally, visualization tools will be employed to identify trends and patterns in travel behavior.
By combining exploratory and inferential approaches, this study gives airport managers and planners a clearer roadmap for making travel a little less stressful — and maybe even a little more enjoyable — for millions of passengers each year.
Keywords
analyzing consumers' travel habits
identify trends and patterns in travel behavior
classical regression methods and neural network techniques
Flight delays can be caused by events such as hazardous weather, crew availability and security issues. When purchasing flight tickets, many passengers hope to minimize delays to avoid spending too much time at the airport or missing a connecting flight. Flight delays often have a cascading effect, where one flight's delay may influence the next. Additionally, multiple flights may experience delays at the same time, when events such as bad weather occur. We hypothesize that airline "hubs" - defined here as an airport/airline pair containing a large percentage of passenger traffic for that airline - may be more equipped to respond to delay perturbations than non-hubs. Herein, a Fast Fourier Transform (FFT) is applied to scheduled arrival/departure times to estimate airport periodicity. The relationship between hub status, periodicity, and delays is explored. We also compare differences between traditional "hub and spoke" airlines such as Delta to "point to point" structured airlines such as Southwest.
Keywords
sports
exploratory analysis
Our study presents a longitudinal analysis that analyzes relationships between airline flight data
from the Bureau of Transportation Statistics and regional partisan shifts from the Biographical Directory of the U.S. Congress. Our motivation is to understand and explain relationships between
flight issues and the local political climates of regions containing airports, including constructing
causal models for these relationships. We focus our attention on 1990 and 2024, which crosses several changes in the national political environment and major historical events that influenced flight
patterns, including 9/11, the 2008 recession, and the COVID-19 pandemic. Using a spatiotemporal
autoregressive model, we identify significant connections between geographic and other factors. Our
findings prompted further modeling to explore causal effects and the partisan consequences of air
travel. Results suggest that while political climates shape flight issues, air travel disruptions can also
influence regional partisan dynamics, forming a feedback loop between transportation infrastructure
and political behavior.
Keywords
Longitudinal Analysis
Causal Effects
Spatiotemporal Autoregressive Model
Partisan Shifts
Flight Issues
Transportation-Politics
Crude oil price fluctuations significantly impact global economies, financial markets, and energy policies. Detecting anomalies in West Texas Intermediate (WTI) crude oil returns is essential for identifying market shocks and enhancing risk management strategies. This study presents a hybrid anomaly detection framework that integrates statistical techniques (Z-score, Bollinger Bands, GARCH) with machine learning models (Isolation Forest, DBSCAN, Autoencoders). Using daily WTI returns from 2014 to 2024, the analysis identifies both extreme return spikes and complex nonlinear deviations.
The results show that Bollinger Bands and GARCH methods detect a higher number of anomalies, reflecting sensitivity to volatility, while machine learning techniques such as Isolation Forest and Autoencoders identify subtler, nonlinear patterns. A total of 26 consensus anomalies, detected by at least three methods, highlight major market disruptions, which was captured by all six models.
This research demonstrates that combining statistical and machine learning approaches enhances anomaly detection by leveraging their complementary strengths. The findings offer valuable insights for financial risk assessment, market surveillance, and economic policy-making, contributing to more robust decision-making in energy and financial markets.
Keywords
Anomaly Detection
WTI Crude Oil Returns
Machine Learning
Financial Risk Management
Volatility Clustering
Isolation Forest
Background: Differentially methylated regions (DMRs) that distinguish cancer patients from non-cancer controls have been identified in tissue. Detection of these cancer-specific DMRs in plasma is challenging due to low bioavailability, thus prompting investigation into identifying DNA fragments with a high likelihood of originating from tumor. Methods: We fit a generalized additive model (GAM) to the percent of methylated fragments in non-cancer controls to estimate an expected methylation profile for 432 DMRs. A centered and scaled deviance score based on the fitted model is calculated for each DMR and used to compare 144 cancer plasma samples representing 8 cancer subtypes versus 71 controls. Results: Of 432 DMRs tested, 49 had p-values < 0.005. Combining all DMRs within a random forest model achieved an out-of-bag prediction AUC of 0.74 for discriminating cases from controls. Conclusion: Future evaluations with training and test sets consisting of >5000 DMRs is underway with the expectation of improving the prediction accuracy for cancer detection and cancer sub-type in plasma. This modeling approach may enhance multicancer detection efforts in cancer screening paradigms.
Keywords
methylation
deviance
generalized additive models
prediction
This project focuses on the analysis and interpretation of a large financial transactions dataset created by Caixabank Tech for the 2024 AI Hackathon, available on Kaggle. The research involves developing interactive Tableau dashboards, including maps of financial transactions across the United States and time series plots to visualize trends over time. In addition to data visualization, I will identify suitable statistical techniques for analysis and apply statistical models and machine learning methods to predict fraudulent transactions within the dataset. By combining data visualization, statistical analysis, and machine learning, this research aims to uncover actionable insights and enhance the detection of fraudulent activities in financial transactions.
Keywords
Fraud Detection
Data Visualization
Financial Transactions
Statistical Models
Machine Learning Techniques
Entrepreneurs opportunities are marginals perusal of upcoming dividends incomes. Whatever Entrepreneurships Interests Focus (E.I.F.) are social, economic or institutional, theirs Acting Level (E.A.L.) are Micro, Meso, or Macro, and theirs Dynamic Trend (E.D.T.) are innovation, impact or problems solving, opportunities are risks to undertake. Hence, for a dividend strategy π, the risk to undertake can be represent by the controlled surplus process R[π]...[/π] = u + ct - ∑X...- L[π]...[/π](t), for i from 0 to N(t) and time t ≥ 0, initial capital u≥ 0 and premium income c ≥ 0. Admissible π ∈ Π has predictable, non-decreasing and left-continuous accumulated dividends L[π]...[/π](t) ≤ u + ct - ∑X.... Furthermore, cross features from E.I.F., E.A.L., and E.D.T. via paired extended decision analysis yield a dynamic n-dimensional decision space. In addition, non-parametric density estimation at state i of risks process allows computing and actualizing any m[th]...[/th] moment of the dividend D[u]...[/u] at this state. This finaly provides upper bounds for optimal strategies by Erlang(n) risk model.
Keywords
Entrepreneurs Opportunities
Dividends Strategy
Controlled Surplus Process
Entrepreneurships Interests Focus (E.I.F.)
Entrepreneurships Acting Level (E.A.L.)
Entrepreneurships Dynamic Trend (E.D.T.)
Radiomics, the extraction of quantitative features from medical images such as CT scans, may provide clinically relevant insights for cancer patient outcomes beyond the information provided by tumor size changes. Prior studies [Abbas et al 2023, Nardone et al 2024] have examined the changes in radiomic features at different time points (termed delta radiomics) to explore its potential as a longitudinal biomarker of cancer response. Additionally, existing studies have shown that delta radiomics (not baseline) has predictive power, with delta tumor volume being the most important feature. However, few radiomics-based biomarkers have been externally validated. Here, we developed a CT-based radiomic signature score for TNBC and bladder cancer to predict association with survival outcome under pembrolizumab monotherapy. Using a penalized Cox regression model and size-change detrended radiomics features analysis, our findings suggest that CT-based delta radiomics is predictive of survival outcomes but does not add value beyond delta volume.
Keywords
radiomics
delta radiomics
biomarkers
oncology
survival
Co-Author(s)
Richard Baumgartner, Merck Research Laboratories
Shubing Wang, Merck & Co., Inc.
Lingkang Huang
Yiqiao Liu, Merck & Co., Inc.
Gregory Goldmacher, Merck & Co., Inc.
Antong Chen, Merck & Co., Inc.
Jianda Yuan, Merck & Co., Inc.
Jared Lunceford, Merck & Co., Inc.
First Author
Michelle Ngo, Merck & Co., Inc.
This study aims to develop and validate a novel hybrid neural network (HNN) model that integrates classical statistical methods with ordinary neural networks, combining the strengths of statistical learning and machine learning in sense of structured framework, flexibility and regularization and interpretability.
The proposed HNN model incorporates National Institutes of Health Stroke Scale (NIHSS) item scores, demographic information, medical history, and vascular risk factors to predict LVO. Using both simulated and real-world stroke datasets, we evaluated the model's performance based on sensitivity, specificity, accuracy, and area under the curve (AUC). Comparisons were made against other methods, including logistic regression, Random Forest, Decision Tree, and ordinary neural networks. Results from the study demonstrate that the HNN model consistently outperforms traditional statistical and ML-based approaches. Accuracy of HNN is greater than that of logistic regression or ordinary NN by at least 3%. By leveraging the complementary advantages of statistical and neural network methodologies, the HNN offers a robust and efficient tool for prehospital LVO detection.
Keywords
Machine Learning
Deep Learning
Hybrid Neural Network
LVO
Stroke
predictive model
In a sector of continuous growth and development, the market value of each soccer player has become a key element when developing his soccer career. Market value is used to describe how much a player is worth in the transfer market, and is momentous for soccer clubs to determine the financial standing of players.
Several significant factors can influence the market value of a soccer player, such as age, position, number of goals scored, the number of games previously played, etc. The data of soccer players from MLS (Major League Soccer) were gathered from MLSsoccer, Transfermarkt, and Opta Sports. The data were manipulated to achieve the goal of this project.
This study aims to build and compare predictive models using machine learning algorithms to estimate the market value of MLS players based on several key factors, which will help clubs and agents objectively predict the worth of a player they would like to buy or sell.
Keywords
soccer
players' market values
MLS (Major League Soccer)
machine learning
Intensive Care Unit (ICU) readmissions among patients with heart failure (HF) impose a substantial economic burden on both patients and healthcare systems. While previous studies have identified various predictors of readmission, consensus on their relative importance and optimal predictive models remains limited. This study aims to evaluate key predictors and assess the performance of different modeling approaches in forecasting 30-day ICU readmissions using the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. This study applied logistic regression, classification trees, and random forest models to develop predictive frameworks. Although overall model performance did not surpass findings from prior studies, hemoglobin emerged as a significant predictor of 30-day readmission, reinforcing its clinical relevance in HF patient management. These findings highlight the challenges and potential of predictive modeling in ICU readmission risk assessment.
Keywords
Heart Failure
ICU Readmission
Electronic Health Record
Machine Learning
Variable Importance
Violent death rates in the United States exhibit pronounced racial disparities that challenge the healthcare, insurance, and public safety sectors. These disparities, shaped by demographics, mental health, substance abuse, and geography, complicate practical risk assessment and targeted interventions. Leveraging data from the National Violent Death Reporting System (NVDRS) for 2020–2021, this study examines racial differences in suicides, homicides, and other violent deaths. Logistic regression models assess the effects of race, age, sex, mental health, substance use, and state-level variability. The results are compared with several machine learning models to evaluate the trade-off between predictive performance and interpretability. Guided by the Social Determinants of Health and structured with the Design Science Framework, findings reveal that logistic regression delivers interpretable, actionable insights while achieving competitive accuracy and sensitivity. These insights enhance our understanding of violent death outcomes and support the development of refined risk profiles and targeted business solutions for high-risk groups.
Keywords
MORTALITY
RACE
LOGISTIC REGRESSION
MACHINE LEARNING
VIOLENT DEATHS
UNITED STATES
Ambulatory blood pressure monitoring (ABPM) is widely used to track blood pressure and heart rate over periods of 24 hours or more. Most existing studies rely on basic summary statistics of ABPM data, such as means or medians, which obscure temporal features like nocturnal dipping and individual chronotypes. To better characterize the temporal features of ABPM data, we propose a novel smooth tensor decomposition method. Built upon traditional low-rank tensor factorization techniques, our method incorporates a smoothing penalty to handle noise and employs an iterative algorithm to impute missing data. We also develop an automatic approach for the selection of optimal smoothing parameters and ranks. We apply our method to ABPM data from patients with concurrent obstructive sleep apnea and type II diabetes. Our method explains temporal components of data variation and outperforms the traditional approach of using summary statistics in capturing the associations between covariates and ABPM measurements. Notably, it distinguishes covariates that influence the overall levels of blood pressure and heart rate from those that affect the contrast between the two.
Keywords
Low-rank tensor factorization
Smoothing penalty
Missing data imputation
The exchange process of commodities is the essence of market dynamics. Features such as value, price, and satisfaction are commonly interpreted in the field of economics. However, this has also caught the attention of statistical physicists, who view the market as a statistical ensemble. Concepts borrowed from statistical mechanics, such as temperature or entropy, now appear in the understanding of market dynamics. In this context, we developed a microscopic model for the exchange of a basket of commodities. We consider that people value each commodity in an "individual and subjective" manner and eventually decide to exchange them in the market. We ran the model with a large number of agents acting as traders. We recorded all the trading actions and computed the statistical distribution of exchange ratios and the flux of commodities. These simulations allowed us to make a connection between price and the thermodynamic concept of temperature. The corresponding entropy of the system was also compared to that expected for a microscopic thermodynamic system.
Keywords
exchange value
valuation
econophysics
entropy
temperature
The Inequality Process (IP) (Angle, 1983-2022) is a stochastic particle system model of a process of competitive exclusion driving wealth production. The IP may be a natural law; it has been adopted as econophysics. Labor income statistics teem with invariant patterns implied by the IP. The IP implies a number of statistical patterns, "stylized facts", in the market capitalizations of exchange listed corporations. This paper identifies strategies that buyers/sellers of listed stocks use that are implied by the IP, i.e. putting those strategies on an econophysical footing. Recognized experts in quantitative finance claim nothing like the IP operates in stock markets.
Keywords
competition
invariances
particle system
quantitative finance
stock market
trading strategies
First Author
John Angle, The Inequality Process Institute LLC
Presenting Author
John Angle, The Inequality Process Institute LLC
In sales operations, a customer is first assigned to a sales program (e.g., defined by market segmentation or customer prioritization) and then treated in doses by a sales team within that program (e.g., through meetings and pitches). We use these two levels of treatment assignment to deconfound the impact of a sales team's treatment dose on customer outcomes. First, for sales program assignment based on thresholding rules (e.g., customer spending), we apply regression discontinuity techniques to identify exogenous variation in that assignment process. Second, using this exogenous variation, we apply instrumental variables techniques to analyze the impact of sales treatments on a continuous scale. We present a case study and application about the intent-to-treat and as-treated impact of sales specialists that focus on key product areas in Google's advertising business.
Keywords
Causal inference
Two-stage treatment
Regression discontinuity
Instrumental variables
Customer sales
Impact analysis
The NFL has had a problem with overtimes for decades; the team with the first possession in overtime has had a distinct advantage due to the sudden death rules. In 2023, the NFL changed their playoff overtime rules with the aim for both teams to have an equal chance of winning regardless of who gets the ball first. An example under the new rules: Team A wins the coin toss and can elect to kick or receive. Suppose Team A decides to receive the ball. After Team A's attempt, Team B gets a turn regardless. If the score is tied, the first team to score wins, with Team A to receive. The first time this rule was implemented was in the 2024 Super Bowl, where the team who won the coin toss elected to receive and lost. This incited the masses to declare receiving first is the wrong decision. We feel this decision is more complex. To investigate whether it is better to receive the ball first or second given these new rules, we constructed a series of discrete-time Markov chain models to estimate the probability of winning for each team across a range of scoring probabilities. In particular, the Markov models allow Team B to change strategies in reaction to the outcome Team A's possession.
Keywords
Sports
National Football League
Markov Chain Models
Time series forecasting is essential for various real-world applications, often requiring domain expertise and extensive feature engineering, which can be time-consuming and knowledge-intensive. Deep learning offers a compelling alternative, enabling data-driven approaches to efficiently capture temporal dynamics. This talk introduces a new class of Transformer-based models for time series forecasting, leveraging attention mechanisms while integrating principles from classical time series methods to enhance their ability to learn complex patterns. These models are highly versatile, effectively handling both univariate and multivariate time series data. Empirical evaluations demonstrate significant improvements over conventional benchmarks, showcasing the practical effectiveness of these models.
Keywords
Time Series Forecasting
Transformer Models
Deep Learning
Attention Mechanism
Temporal Dynamics
Univariate and Multivariate Forecasting
First Author
Thu Nguyen, University of Maryland-Baltimore County
Presenting Author
Thu Nguyen, University of Maryland-Baltimore County