Statistical Modeling and Performance Analysis in Sports

John Bassler Chair
University of Alabama at Birmingham
 
Thursday, Aug 7: 8:30 AM - 10:20 AM
4215 
Contributed Papers 
Music City Center 
Room: CC-208A 
Talks in this section showcase statistical modeling techniques and performance analysis across diverse sports. Presentations will explore simulation models for strategic decision-making, comparisons of outcome models, motivational drivers, team dynamics , and data-driven updates for various applications. Attendees will discover how statistical methods can optimize performance and strategic planning in competitive sports.

Main Sponsor

Section on Statistics in Sports

Co Sponsors

Section on Statistics in Sports

Presentations

Bayesian reliability assessment for ordinal scoring system

Assessing the reliability of ordinal scoring systems is a common challenge in research, yet existing tools are limited. To address this gap, we propose a latent variable model that extends the intra-class correlation coefficient (ICC) to accommodate ordinal scores, providing greater flexibility for diverse study designs. The model incorporates mixed effects to account for random variability among raters and subjects. It is particularly suited for unbalanced designs, where the number of evaluations varies across raters and subjects. We develop a full Bayesian framework for inference, enabling estimation of unknown parameters and evaluation of the reliability of ordinal scoring systems. Our results indicate that the proposed approach performs comparably to the cumulative link mixed model when sample sizes are large and significantly outperforms it in small-sample settings. In addition, our method is robust to unbalanced studies, where the number of observations per rater or subject varies. A key advantage of our method is the ability to directly obtain credible intervals for variance parameters and the ICC, which are challenging to estimate using other existing methods. 

Keywords

Ordinal data

intra-class correlation

mixed effects

kappa statistic

Bayesian inference 

Co-Author(s)

Warwick Bayly, Department of Veterinary Clinical Sciences, Washington State University
Yuan Wang, Washington State University

First Author

Wiriyaporn Laaied, Washington State University

Presenting Author

Wiriyaporn Laaied, Washington State University

Beyond Expected Goals: A Probabilistic Framework for Shot Occurrences in Soccer

Expected goals (xG) models are widely used to evaluate team match performance in goal-based sports such as soccer and hockey. However, these models rely exclusively on observed shots, omitting crucial information about near-chances, shots nullified by fouls, and other unrealized shot attempts. Just as xG enhances our understanding of match outcomes by modeling the randomness of goal-scoring, we propose a method that accounts for the randomness of shot occurrence. We first fit a multinomial shot probability model to estimate the likelihood of different shot types occurring during a possession, enabling us to account for possessions where a shot could have occurred but did not. We then integrate these probabilities with existing xG models to construct a re-weighted expected goals metric that more accurately reflects team offensive performance and better aligns with intuitive evaluations of how they played. Finally, we evaluate the effectiveness of our approach by comparing its descriptive and predictive power against standard xG models. We demonstrate that our refined metric provides a more comprehensive and accurate assessment of team quality and scoring potential. 

Keywords

Expected Goals

Sports Analytics

Statistical Learning

Team Performance

Multinomial Models 

Co-Author

R. Paul Sabin

First Author

Jonathan Pipping, The Wharton School, Department of Statistics & Data Science

Presenting Author

Jonathan Pipping, The Wharton School, Department of Statistics & Data Science

Comparison of traditional versus duration specific performance measurements used to select a cricket

As the International Cricket Council (ICC) strives to broaden the appeal of cricket, so too is there a corresponding growth in cricket sporting analytics. Three decades ago, interest in cricket was predominantly for those on the Asian sub-continent and selective former British colonies and the UK. The landscape has changed with other countries becoming more engaged for example twenty countries competed in the 2024 men's T20 world cup hosted jointly in the USA and the Caribbean islands of the West Indies.

Optimizing the selection of a cricket team using an integer program was proposed by Gerber and Sharp (2006) and has been adopted by others over the years. The methodology was extended to relay swimming team selection and continues to garner interest in other sporting codes. This paper returns to the origins of the methodology and assesses performance measures proposed for the specific format of international cricket, i.e. the longer format (five-day test matches) through to the shorter format (20-over contests). 

Keywords

Team selection

optimization

cricket 

First Author

Gary Sharp, Nelson Mandela University

Presenting Author

Gary Sharp, Nelson Mandela University

Down, set, hut! Explaining variability in snap timing on plays with motion

Player tracking data have provided great opportunities to generate novel insights into understudied areas of American football, such as pre-snap motion. Using a Bayesian multilevel model, we provide an assessment of a quarterback's ability to adapt and align the ball snap with pre-snap motion from their teammates. We focus on pass plays with receivers in motion at snap and running a route, and define the snap timing as the time between the start of the receiver's motion and the ball snap. We assume a Gamma distribution for the play-level snap timing and model the mean parameter with player and team random effects, along with relevant fixed effects such as the motion type identified via a Gaussian mixture model. Most importantly, we model the shape parameter with quarterback random effects, which enables us to estimate the differences in snap timing variability among NFL quarterbacks. We demonstrate that higher variability in snap timing is beneficial for the passing game, as it relates to facing less havoc created by the opposing defense. We also obtain a quarterback leaderboard based on our snap timing variability measure, and Patrick Mahomes stands out as the top rated player. 

Keywords

Bayesian statistics

mixed-effects model

uncertainty quantification

tracking data

American football

statistics in sports 

Co-Author

Ronald Yurko, Department of Statistics & Data Science, Carnegie Mellon University

First Author

Quang Nguyen, Carnegie Mellon University

Presenting Author

Quang Nguyen, Carnegie Mellon University

On the Evolution of the Called Strike Zone in Major League Baseball from 2008-2023

The accuracy and consistency with which home plate umpires call pitches can play a significant role in the outcomes of Major League Baseball (MLB) games. Here, we investigate trends in called pitch accuracy and within-game consistency for all regular season MLB games spanning 2008-2023, and using neural networks followed by the fitting of extended superelliptical models, we map the locations of these pitches to quantify how well the geometry of the called strike zone aligns with that of the rule-book strike zone in each season. Our results show that there was steady improvement in both accuracy and within-game consistency of the called strike zone, and that by 2023 its geometry aligned with that of the rule-book strike zone in every respect except its size, which was about 10% too large. The impact of these findings on the forthcoming implementation of a robo-ump challenge system in MLB is discussed. 

Keywords

sports statistics

misclassification rate

robo-umps

morphometrics 

First Author

Dale Zimmerman, University of Iowa

Presenting Author

Dale Zimmerman, University of Iowa