Wednesday, Aug 6: 8:30 AM - 10:20 AM
4149
Contributed Papers
Music City Center
Room: CC-207D
This session explores cutting-edge approaches to predictive analytics and advanced statistical metrics in sports. Presentations will showcase innovative models, including probabilistic frameworks for soccer analytics, new evaluation metrics for NBA players, and dynamic strategies for in-game decision-making. Sports featured include tennis, baseball, basketball, and curling. Attendees will gain insights into leveraging data to enhance strategic decisions and performance evaluations across various sports contexts.
Main Sponsor
Section on Statistics in Sports
Presentations
The benefits of adolescents participating in sports have been well recognized and documented for a long time. However, a significant concern in youth sports is the high dropout rate among young athletes. Anecdotal evidence suggests a lack of improvement over an extended period of time is one of the main factors that cause swimmers to leave the sport. This study is the first one that adopts a survival analysis framework to formally test these hypotheses. Using a large, publicly available database on competitive swimmers, this research examines how swimmers' performance improvement affects their decisions to quit or not. Analyzing nearly 12,000 swimmers' meet performances over the last 10 years, we create two metrics to track swimmer performance improvement. One measures swimmers' self-improvement, the other measures their relative improvement compared with peers. The main findings include (1) swimmers' absolute performance level and the speed of improvement both influence their dropout probability, with the absolute performance level being a more important factor; (2) swimmers who are faster when they are younger but slower when they grow up are more likely to quit; (3) if swimmers
Keywords
survival analysis
sports dropout
adolescent
large data
Team and player evaluation in professional sport is extremely important given the financial implications of success/failure. It is especially critical to identify and retain elite shooters in the National Basketball Association (NBA), one of the premier basketball leagues worldwide because the ultimate goal of the game is to score more points than one's opponent. To this end we propose two novel basketball metrics: "expected points" for team-based comparisons and "expected points above average (EPAA)" as a player-evaluation tool. Both metrics leverage posterior samples from Bayesian hierarchical modeling framework to cluster teams and players based on their shooting propensities and abilities. We illustrate the concepts for the top 100 shot takers over the last decade and offer our metric as an additional metric for evaluating players.
Keywords
Sports Analytics
Basketball
Bayesian Hierarchical Modeling
Despite the individual nature of racing, cross country is widely recognized as a team sport. While each runner competes for the fastest time, team success depends on collective performance, with scoring determined by the placement of a team's top five finishers. Coaches and athletes often cite the team nature of the sport as an important factor in athlete performance, yet this effect has not been statistically quantified. This study examines the impact of team participation on collegiate cross-country performance using observational race data. We analyze results from the Track and Field Results Reporting System (TFRRS), comparing runners competing as part of a team ("attached") to those running independently ("unattached"). Using a linear mixed-effects model, we control for confounders such as race distance, competition level, and athlete-specific variability. Our findings reveal a significant team effect, with attached runners gaining a significant advantage over unattached runners on average. Our results provide empirical support for the long-held belief in cross country that running for a team enhances performance.
Keywords
Mixed model
Sport
Cross Country
In soccer, game context can skew offensive statistics, potentially misrepresenting a team's performance. For example, the score often dictates tactical decisions (e.g., teams may adopt a more defensive approach when leading to limit the opponent's scoring opportunities). Additionally, extenuating circumstances such as red cards can disrupt the balance of play. We analyze minute-by-minute event-sequenced match data from 15 seasons across five major European leagues to examine how game context influences offensive performance in various statistical categories, including shot attempts, corner kicks, shots on goal, and expected goals (xG). Our analysis incorporates Generalized Additive Modeling (GAM) techniques with explanatory variables such as score differential, red card differential, home/away status, prematch win probabilities, and game minute. The chosen model is applied to project offensive numerical outputs onto a "common denominator" scenario: a tied home game played at even strength. This approach provides a more contextualized evaluation of teams' offensive performances, potentially yielding alternative insights into game dynamics.
Keywords
Generalized Additive Models
Model Selection
Negative Binomial
Sports Analytics
Zero-Inflated Poisson
Canadian curlers compete in annual national championships (the "Brier" for men and the "Scotties" for women). Regulations dictate that the championship entrants include at least one team from each Canadian province and territory. About a decade ago, curling officials gave an automatic bye to the national championships for the previous year's winner. The championships have now become 18-team events with the addition of "wild card" entrants. We develop a simulation model to determine the probability that any team captures the national championship trophy. We investigate the fairness of automatic byes for the previous year's champions. In addition, we compare the competitive depth of curling in the men's and women's events.
Keywords
Sports
Simulation analysis
Logistic regression
Tennis, traditionally relies on coach observations and pre-match analysis for player development and performance prediction. While sports analytics has revolutionised many aspects of the game, in-game betting strategies remain largely unexplored. This article attempts to fill the gap in the extant literature, by proposing a novel Markov Decision Process (MDP) framework that provides real-time betting recommendations during a match. The proposed model assesses the evolving match dynamics and generates recommendations for the bettor after every game of a tennis match. These recommendations include:
a) Bet/No-Bet Decision: Advising whether to place a bet on any player or abstain.
b) Optimal Betting Fraction: Determining the optimal proportion of the available betting capital to allocate to the bet.
Unlike pre-game strategies based on rankings, this approach adapts to in-game dynamics. Tested on WTA matches, the MDP-based model outperforms traditional betting strategies, demonstrating its potential for optimising in-game tennis betting. Robustness checks are done to establish that the method works well across various scenarios.
Keywords
OR is sports
Markov decision process
Betting in tennis
In-game forecasting
This paper proposes a statistical model for player chemistry by extending the de facto Elo rating system. While various rating systems have been proposed, almost all rating systems assume that players' ratings are totally ordered and transitivity holds. Such assumption precludes possibilities that a specific player plays very
well against another specific player regardless of their general ability. The proposed model consists of (i) a statistical test for the existence of pairwise player chemistry (intransitivity) for the entire group of players and (ii) estimation of winning probability for each of the pairs with the inclusion of player chemistry. We call our model P-Elo model. We will compare P-Elo model to the traditional Elo rating system on sports: Sumo Wresting (SW) and Mixed Martial Art (MMA); as well as the recently popular Large Language Model (LLM) evaluation in terms of match/comparison result prediction and probability estimation.
Keywords
Elo Ratings
Bradley-Terry model
Ranking systems
Statistical modelling
Time series
Sports
forecasting