Thursday, Aug 7: 8:30 AM - 10:20 AM
4215
Contributed Papers
Music City Center
Room: CC-208A
Talks in this section showcase statistical modeling techniques and performance analysis across diverse sports. Presentations will explore simulation models for strategic decision-making, comparisons of outcome models, motivational drivers, team dynamics , and data-driven updates for various applications. Attendees will discover how statistical methods can optimize performance and strategic planning in competitive sports.
Main Sponsor
Section on Statistics in Sports
Co Sponsors
Section on Statistics in Sports
Presentations
Assessing the reliability of ordinal scoring systems is a common challenge in research, yet existing tools are limited. To address this gap, we propose a latent variable model that extends the intra-class correlation coefficient (ICC) to accommodate ordinal scores, providing greater flexibility for diverse study designs. The model incorporates mixed effects to account for random variability among raters and subjects. It is particularly suited for unbalanced designs, where the number of evaluations varies across raters and subjects. We develop a full Bayesian framework for inference, enabling estimation of unknown parameters and evaluation of the reliability of ordinal scoring systems. Our results indicate that the proposed approach performs comparably to the cumulative link mixed model when sample sizes are large and significantly outperforms it in small-sample settings. In addition, our method is robust to unbalanced studies, where the number of observations per rater or subject varies. A key advantage of our method is the ability to directly obtain credible intervals for variance parameters and the ICC, which are challenging to estimate using other existing methods.
Keywords
Ordinal data
intra-class correlation
mixed effects
kappa statistic
Bayesian inference
Expected goals (xG) models are widely used to evaluate team match performance in goal-based sports such as soccer and hockey. However, these models rely exclusively on observed shots, omitting crucial information about near-chances, shots nullified by fouls, and other unrealized shot attempts. Just as xG enhances our understanding of match outcomes by modeling the randomness of goal-scoring, we propose a method that accounts for the randomness of shot occurrence. We first fit a multinomial shot probability model to estimate the likelihood of different shot types occurring during a possession, enabling us to account for possessions where a shot could have occurred but did not. We then integrate these probabilities with existing xG models to construct a re-weighted expected goals metric that more accurately reflects team offensive performance and better aligns with intuitive evaluations of how they played. Finally, we evaluate the effectiveness of our approach by comparing its descriptive and predictive power against standard xG models. We demonstrate that our refined metric provides a more comprehensive and accurate assessment of team quality and scoring potential.
Keywords
Expected Goals
Sports Analytics
Statistical Learning
Team Performance
Multinomial Models
As the International Cricket Council (ICC) strives to broaden the appeal of cricket, so too is there a corresponding growth in cricket sporting analytics. Three decades ago, interest in cricket was predominantly for those on the Asian sub-continent and selective former British colonies and the UK. The landscape has changed with other countries becoming more engaged for example twenty countries competed in the 2024 men's T20 world cup hosted jointly in the USA and the Caribbean islands of the West Indies.
Optimizing the selection of a cricket team using an integer program was proposed by Gerber and Sharp (2006) and has been adopted by others over the years. The methodology was extended to relay swimming team selection and continues to garner interest in other sporting codes. This paper returns to the origins of the methodology and assesses performance measures proposed for the specific format of international cricket, i.e. the longer format (five-day test matches) through to the shorter format (20-over contests).
Keywords
Team selection
optimization
cricket
First Author
Gary Sharp, Nelson Mandela University
Presenting Author
Gary Sharp, Nelson Mandela University
Player tracking data have provided great opportunities to generate novel insights into understudied areas of American football, such as pre-snap motion. Using a Bayesian multilevel model, we provide an assessment of a quarterback's ability to adapt and align the ball snap with pre-snap motion from their teammates. We focus on pass plays with receivers in motion at snap and running a route, and define the snap timing as the time between the start of the receiver's motion and the ball snap. We assume a Gamma distribution for the play-level snap timing and model the mean parameter with player and team random effects, along with relevant fixed effects such as the motion type identified via a Gaussian mixture model. Most importantly, we model the shape parameter with quarterback random effects, which enables us to estimate the differences in snap timing variability among NFL quarterbacks. We demonstrate that higher variability in snap timing is beneficial for the passing game, as it relates to facing less havoc created by the opposing defense. We also obtain a quarterback leaderboard based on our snap timing variability measure, and Patrick Mahomes stands out as the top rated player.
Keywords
Bayesian statistics
mixed-effects model
uncertainty quantification
tracking data
American football
statistics in sports
Co-Author
Ronald Yurko, Department of Statistics & Data Science, Carnegie Mellon University
First Author
Quang Nguyen, Carnegie Mellon University
Presenting Author
Quang Nguyen, Carnegie Mellon University
The accuracy and consistency with which home plate umpires call pitches can play a significant role in the outcomes of Major League Baseball (MLB) games. Here, we investigate trends in called pitch accuracy and within-game consistency for all regular season MLB games spanning 2008-2023, and using neural networks followed by the fitting of extended superelliptical models, we map the locations of these pitches to quantify how well the geometry of the called strike zone aligns with that of the rule-book strike zone in each season. Our results show that there was steady improvement in both accuracy and within-game consistency of the called strike zone, and that by 2023 its geometry aligned with that of the rule-book strike zone in every respect except its size, which was about 10% too large. The impact of these findings on the forthcoming implementation of a robo-ump challenge system in MLB is discussed.
Keywords
sports statistics
misclassification rate
robo-umps
morphometrics