Predictive Analytics and Advanced Metrics in Sports

Eric Gerber Chair
 
Wednesday, Aug 6: 8:30 AM - 10:20 AM
4149 
Contributed Papers 
Music City Center 
Room: CC-207D 
This session explores cutting-edge approaches to predictive analytics and advanced statistical metrics in sports. Presentations will showcase innovative models, including probabilistic frameworks for soccer analytics, new evaluation metrics for NBA players, and dynamic strategies for in-game decision-making. Sports featured include tennis, baseball, basketball, and curling. Attendees will gain insights into leveraging data to enhance strategic decisions and performance evaluations across various sports contexts.

Main Sponsor

Section on Statistics in Sports

Presentations

WITHDRAWN How Performance Improvement Affects Adolescent Dropout in Swimming: A Survival Analysis

The benefits of adolescents participating in sports have been well recognized and documented for a long time. However, a significant concern in youth sports is the high dropout rate among young athletes. Anecdotal evidence suggests a lack of improvement over an extended period of time is one of the main factors that cause swimmers to leave the sport. This study is the first one that adopts a survival analysis framework to formally test these hypotheses. Using a large, publicly available database on competitive swimmers, this research examines how swimmers' performance improvement affects their decisions to quit or not. Analyzing nearly 12,000 swimmers' meet performances over the last 10 years, we create two metrics to track swimmer performance improvement. One measures swimmers' self-improvement, the other measures their relative improvement compared with peers. The main findings include (1) swimmers' absolute performance level and the speed of improvement both influence their dropout probability, with the absolute performance level being a more important factor; (2) swimmers who are faster when they are younger but slower when they grow up are more likely to quit; (3) if swimmers 

Keywords

survival analysis

sports dropout

adolescent

large data 

First Author

Austin Yang

Expected Points Above Average: A Novel NBA Player Metric Based on Bayesian Hierarchical Modeling

Team and player evaluation in professional sport is extremely important given the financial implications of success/failure. It is especially critical to identify and retain elite shooters in the National Basketball Association (NBA), one of the premier basketball leagues worldwide because the ultimate goal of the game is to score more points than one's opponent. To this end we propose two novel basketball metrics: "expected points" for team-based comparisons and "expected points above average (EPAA)" as a player-evaluation tool. Both metrics leverage posterior samples from Bayesian hierarchical modeling framework to cluster teams and players based on their shooting propensities and abilities. We illustrate the concepts for the top 100 shot takers over the last decade and offer our metric as an additional metric for evaluating players. 

Keywords

Sports Analytics

Basketball

Bayesian Hierarchical Modeling 

Co-Author(s)

Benjamin Williams, University of Denver
Erin Schliep, North Carolina State University
Bailey Fosdick, GTI Energy & Colorado School of Public Health

First Author

Ryan Elmore, University of Denver

Presenting Author

Ryan Elmore, University of Denver

Is Cross Country a Team Sport? A Statistical Perspective

Despite the individual nature of racing, cross country is widely recognized as a team sport. While each runner competes for the fastest time, team success depends on collective performance, with scoring determined by the placement of a team's top five finishers. Coaches and athletes often cite the team nature of the sport as an important factor in athlete performance, yet this effect has not been statistically quantified. This study examines the impact of team participation on collegiate cross-country performance using observational race data. We analyze results from the Track and Field Results Reporting System (TFRRS), comparing runners competing as part of a team ("attached") to those running independently ("unattached"). Using a linear mixed-effects model, we control for confounders such as race distance, competition level, and athlete-specific variability. Our findings reveal a significant team effect, with attached runners gaining a significant advantage over unattached runners on average. Our results provide empirical support for the long-held belief in cross country that running for a team enhances performance. 

Keywords

Mixed model

Sport

Cross Country 

Co-Author(s)

Brylee Wilcox
Sam Lee
Garritt Page, Brigham Young University

First Author

Nathan Sandholtz, Brigham Young University

Presenting Author

Brylee Wilcox

Leveraging Minute-by-Minute Soccer Match Data to Adjust Team Offensive Performance for Game Context

In soccer, game context can skew offensive statistics, potentially misrepresenting a team's performance. For example, the score often dictates tactical decisions (e.g., teams may adopt a more defensive approach when leading to limit the opponent's scoring opportunities). Additionally, extenuating circumstances such as red cards can disrupt the balance of play. We analyze minute-by-minute event-sequenced match data from 15 seasons across five major European leagues to examine how game context influences offensive performance in various statistical categories, including shot attempts, corner kicks, shots on goal, and expected goals (xG). Our analysis incorporates Generalized Additive Modeling (GAM) techniques with explanatory variables such as score differential, red card differential, home/away status, prematch win probabilities, and game minute. The chosen model is applied to project offensive numerical outputs onto a "common denominator" scenario: a tied home game played at even strength. This approach provides a more contextualized evaluation of teams' offensive performances, potentially yielding alternative insights into game dynamics. 

Keywords

Generalized Additive Models

Model Selection

Negative Binomial

Sports Analytics

Zero-Inflated Poisson 

Co-Author(s)

Ahmet Cemek, New College of Florida
David Gillman, New College of Florida

First Author

Andrey Skripnikov, New College of Florida

Presenting Author

Andrey Skripnikov, New College of Florida

Simulation modeling in the sport of curling: Evaluating men’s and women’s national competitions

Canadian curlers compete in annual national championships (the "Brier" for men and the "Scotties" for women). Regulations dictate that the championship entrants include at least one team from each Canadian province and territory. About a decade ago, curling officials gave an automatic bye to the national championships for the previous year's winner. The championships have now become 18-team events with the addition of "wild card" entrants. We develop a simulation model to determine the probability that any team captures the national championship trophy. We investigate the fairness of automatic byes for the previous year's champions. In addition, we compare the competitive depth of curling in the men's and women's events. 

Keywords

Sports

Simulation analysis

Logistic regression 

Co-Author

Kent Kostuk, Engcomp

First Author

Keith Willoughby, Edwards School of Business

Presenting Author

Keith Willoughby, Edwards School of Business

WITHDRAWN Efficiency of live betting markets in tennis

Tennis, traditionally relies on coach observations and pre-match analysis for player development and performance prediction. While sports analytics has revolutionised many aspects of the game, in-game betting strategies remain largely unexplored. This article attempts to fill the gap in the extant literature, by proposing a novel Markov Decision Process (MDP) framework that provides real-time betting recommendations during a match. The proposed model assesses the evolving match dynamics and generates recommendations for the bettor after every game of a tennis match. These recommendations include:
a) Bet/No-Bet Decision: Advising whether to place a bet on any player or abstain.
b) Optimal Betting Fraction: Determining the optimal proportion of the available betting capital to allocate to the bet.
Unlike pre-game strategies based on rankings, this approach adapts to in-game dynamics. Tested on WTA matches, the MDP-based model outperforms traditional betting strategies, demonstrating its potential for optimising in-game tennis betting. Robustness checks are done to establish that the method works well across various scenarios. 

Keywords

OR is sports

Markov decision process

Betting in tennis

In-game forecasting 

Co-Author(s)

Rishideep Roy, University of Essex
Soudeep Deb, Indian Institute of Management Bangalore

First Author

Chinmay Divekar, Indian Institute of Management Bangalore

Pairwise-Elo (P-Elo) Rating System

This paper proposes a statistical model for player chemistry by extending the de facto Elo rating system. While various rating systems have been proposed, almost all rating systems assume that players' ratings are totally ordered and transitivity holds. Such assumption precludes possibilities that a specific player plays very
well against another specific player regardless of their general ability. The proposed model consists of (i) a statistical test for the existence of pairwise player chemistry (intransitivity) for the entire group of players and (ii) estimation of winning probability for each of the pairs with the inclusion of player chemistry. We call our model P-Elo model. We will compare P-Elo model to the traditional Elo rating system on sports: Sumo Wresting (SW) and Mixed Martial Art (MMA); as well as the recently popular Large Language Model (LLM) evaluation in terms of match/comparison result prediction and probability estimation. 

Keywords

Elo Ratings

Bradley-Terry model

Ranking systems

Statistical modelling

Time series

Sports
forecasting 

Co-Author

Kazuhiko Shinki, Wayne State University

First Author

Kin Hang Wong, Wayne State University

Presenting Author

Kin Hang Wong, Wayne State University