There are 2 parts to the code for this project:
In the first part, pitch by pitch Statcast data is used to create a Run Expectancy matrix using the 24 base-out states
In the second part, the value of each base-out state is used to create a column in the Statcast data, 'RUNS_VALUE' that estimates the change in run expectancy for each at bat using the RE24 formula:
Run Expectancy in End State - Run Exepectancy in Beginning State + Runs Scored
.
Here is an example from the mutated dataset featuring a random half inning
After the change in run expectancy for each play is estimated, players can be evaluated by the degree of total change in run expectancy that occurs when they are on the mound or at the plate
- It only says the pitcher's name in my particular Statcast dataset so I chose the following 3 pitchers from 2022 to evaluate: Cole Irvin, Luis Severino, Corbin Burnes
.
First, the amount of times they faced a batter in each base state is displayed
Along with the cumulative sum of the change in run expectancy
Next, a strip chart can be created to visualize each individual change in run expectancy as they are distributed by base state for each player
Then, RE24 can be calculated by summing together every change in run expectancy for each pitcher
Typical RE24 is a counting stat but there is code included to make it into a rate-type stat too
!! It is important to note that for pitchers, the cumulative sum of changes in run expectancy will result in a negative number for good pitchers and positive for bad pitchers, so it needs to be multipled by -1 to resemble the typical RE24 stat format shown below !!
Finally, the code features a way to make official leaderboards for RE24 & my rated version of RE24
Chapter 5 of the book 'Analyzing Baseball Data with R' guided me in building out this project, but they use Retrosheet data instead of Statcast
.
Statcast data was scraped in R using baseballr::statcast_search_pitchers()
.
Please feel free to email me at josephmontes.baseball@gmail.com with any questions, suggestions, or comments about this project. I can also share the large CSV file containing the relevant Statcast pitch by pitch data.