Leveraging Sequential Play-by-Play Data to Adjust for Complementary Unit Performance in American College Football

Conference: Symposium on Data Science and Statistics (SDSS) 2023
05/25/2023: 3:50 PM - 4:15 PM CDT
Refereed 

Description

American football is unique in the sense that the same team's offensive and defensive units typically consist of separate player sets that don't share the field simultaneously, which tempts one to evaluate them independently. Yet, some aspects of your team's defensive (offensive) performance may directly impact the complementary unit, a concept that is typically referred to as "complementary football". For example, turnovers forced by your defense could lead to easier scoring chances for your offense, while your offense's ability to control the clock may in turn help your defense. Moreover, the ability to objectively rank team's offenses and defenses could be of elevated importance in American college football (CFB) specifically, due to heavy title and playoff implications thereof. Our main goal is to identify the most consistently influential features of complementary football in a data-driven way, subsequently adjusting each team's offensive (defensive) performance for that of their complementary unit. To achieve that, for the 2014-2021 CFB seasons, we proceed to leverage sequential play-by-play data to alleviate the issue of reverse causality which permeates the game totals, focusing on how the complementary unit's (e.g. defense) performance on the preceding drive might be affecting the other unit's (offense) performance on the current drive. Variable selection methodologies are implemented to pick the complementary football features of utmost importance that we would be subsequently adjusting for, combined with strength of schedule and home-field factor considerations (both shown to be especially pivotal in the college game). All of that would lead to a better understanding of each team's offensive and defensive rankings, and a more considerate evaluation of their strengths and weaknesses.

Keywords

sport analytics

variable selection

LASSO

natural splines

causality 

Presenting Author

Andrey Skripnikov, New College of Florida

First Author

Andrey Skripnikov, New College of Florida

Target Audience

Mid-Level

Tracks

Practice and Applications
Symposium on Data Science and Statistics (SDSS) 2023