Chapter 4 Statcast
type of data available: pitch-by-pitch, summary statistics (in leaderboards)
Statcast is tracking technology commonly used in Major League Baseball. There are two ways to use data from Statcast in R:
- Downloading the data directly from the Baseball Savant website and importing it into R
- Using the baseballr package
Note Statcast data only goes back to 2015.
4.1 Baseball Savant Website
The Statcast data is very customizable. The leaderboard allows you to select which positions, teams, seasons, and thresholds. In the top right corner, there is an option of which statistic to find the leaderboard for, with hitting, pitching, fielding, running, and positioning. The data can be downloaded as a CSV file. Statcast Search allows for much more user customization. Here is a screenshot of the page:
In the top left corner of the data there are three icons: The middle blue option downloads the results of the search as a .CSV. The rightmost icon will download the data. It will contain pitch-by-pitch data, so the file is very large. Often it will only work for short lists of results, otherwise it will time out.
The file we downloaded is the default table from clicking on “Statistics”, “Player Pitching”, “2022”. No variables were added or removed.
<- read_csv("data/sc_pitching_2022.csv") sc_download
last_name | first_name | player_id | year | xba | xslg | xwoba | xobp | xiso | exit_velocity_avg | launch_angle_avg | barrel_batted_rate |
---|---|---|---|---|---|---|---|---|---|---|---|
Wainwright | Adam | 425794 | 2022 | 0.270 | 0.419 | 0.328 | 0.326 | 0.148 | 87.8 | 11.0 | 6.5 |
Verlander | Justin | 434378 | 2022 | 0.207 | 0.331 | 0.255 | 0.248 | 0.124 | 87.8 | 16.9 | 6.3 |
Kluber | Corey | 446372 | 2022 | 0.261 | 0.416 | 0.310 | 0.294 | 0.155 | 87.1 | 18.5 | 6.9 |
Morton | Charlie | 450203 | 2022 | 0.228 | 0.389 | 0.314 | 0.315 | 0.161 | 89.3 | 13.6 | 9.5 |
Quintana | Jose | 500779 | 2022 | 0.257 | 0.377 | 0.305 | 0.312 | 0.120 | 86.5 | 9.9 | 5.5 |
Gibson | Kyle | 502043 | 2022 | 0.262 | 0.420 | 0.326 | 0.320 | 0.158 | 88.5 | 10.6 | 7.5 |
Darvish | Yu | 506433 | 2022 | 0.227 | 0.388 | 0.291 | 0.275 | 0.161 | 88.5 | 17.0 | 8.8 |
Kelly | Merrill | 518876 | 2022 | 0.230 | 0.383 | 0.296 | 0.291 | 0.153 | 88.5 | 14.0 | 8.3 |
Perez | Martin | 527048 | 2022 | 0.242 | 0.346 | 0.295 | 0.311 | 0.105 | 88.2 | 8.1 | 4.3 |
Anderson | Tyler | 542881 | 2022 | 0.225 | 0.350 | 0.275 | 0.272 | 0.124 | 85.0 | 16.9 | 4.9 |
Cole | Gerrit | 543037 | 2022 | 0.214 | 0.383 | 0.284 | 0.266 | 0.169 | 89.4 | 12.6 | 9.5 |
Lyles | Jordan | 543475 | 2022 | 0.267 | 0.452 | 0.341 | 0.325 | 0.184 | 88.6 | 15.3 | 10.4 |
Mikolas | Miles | 571945 | 2022 | 0.252 | 0.400 | 0.306 | 0.294 | 0.148 | 87.8 | 11.0 | 6.9 |
Gausman | Kevin | 592332 | 2022 | 0.242 | 0.380 | 0.285 | 0.272 | 0.138 | 89.0 | 12.2 | 8.1 |
Ray | Robbie | 592662 | 2022 | 0.223 | 0.373 | 0.295 | 0.292 | 0.150 | 89.7 | 14.7 | 7.9 |
Taillon | Jameson | 592791 | 2022 | 0.260 | 0.432 | 0.317 | 0.297 | 0.172 | 88.5 | 14.5 | 8.3 |
Gonzales | Marco | 594835 | 2022 | 0.267 | 0.435 | 0.330 | 0.320 | 0.169 | 86.7 | 14.5 | 7.2 |
Pivetta | Nick | 601713 | 2022 | 0.248 | 0.433 | 0.332 | 0.324 | 0.185 | 90.7 | 15.2 | 9.0 |
Bassitt | Chris | 605135 | 2022 | 0.228 | 0.359 | 0.290 | 0.291 | 0.132 | 85.7 | 10.8 | 6.6 |
Musgrove | Joe | 605397 | 2022 | 0.225 | 0.351 | 0.282 | 0.284 | 0.126 | 86.4 | 11.7 | 6.0 |
Nola | Aaron | 605400 | 2022 | 0.211 | 0.340 | 0.259 | 0.248 | 0.129 | 87.7 | 12.5 | 7.1 |
Rodon | Carlos | 607074 | 2022 | 0.198 | 0.309 | 0.254 | 0.260 | 0.111 | 89.0 | 19.4 | 6.5 |
Freeland | Kyle | 607536 | 2022 | 0.271 | 0.455 | 0.346 | 0.337 | 0.184 | 89.8 | 12.7 | 9.7 |
Fried | Max | 608331 | 2022 | 0.227 | 0.328 | 0.264 | 0.266 | 0.101 | 86.2 | 7.6 | 4.0 |
Irvin | Cole | 608344 | 2022 | 0.258 | 0.451 | 0.324 | 0.301 | 0.192 | 89.4 | 15.5 | 9.6 |
Marquez | German | 608566 | 2022 | 0.256 | 0.423 | 0.327 | 0.321 | 0.167 | 90.5 | 9.1 | 7.1 |
Quantrill | Cal | 615698 | 2022 | 0.258 | 0.417 | 0.321 | 0.313 | 0.159 | 87.6 | 13.8 | 7.5 |
Berrios | Jose | 621244 | 2022 | 0.275 | 0.466 | 0.346 | 0.329 | 0.191 | 90.0 | 13.9 | 9.5 |
Urias | Julio | 628711 | 2022 | 0.205 | 0.332 | 0.262 | 0.258 | 0.128 | 86.7 | 17.2 | 6.7 |
Lopez | Pablo | 641154 | 2022 | 0.239 | 0.378 | 0.301 | 0.302 | 0.138 | 87.9 | 11.0 | 9.0 |
Alcantara | Sandy | 645261 | 2022 | 0.215 | 0.331 | 0.267 | 0.268 | 0.116 | 87.8 | 5.5 | 5.3 |
Cease | Dylan | 656302 | 2022 | 0.184 | 0.292 | 0.257 | 0.273 | 0.109 | 86.8 | 15.0 | 6.2 |
Montgomery | Jordan | 656756 | 2022 | 0.258 | 0.400 | 0.310 | 0.303 | 0.143 | 88.5 | 9.8 | 7.1 |
Wright | Kyle | 657140 | 2022 | 0.244 | 0.384 | 0.306 | 0.308 | 0.139 | 89.0 | 4.4 | 6.8 |
Webb | Logan | 657277 | 2022 | 0.247 | 0.364 | 0.295 | 0.301 | 0.116 | 88.9 | 3.1 | 5.5 |
Ohtani | Shohei | 660271 | 2022 | 0.204 | 0.311 | 0.256 | 0.260 | 0.107 | 87.1 | 14.5 | 6.3 |
McKenzie | Triston | 663474 | 2022 | 0.222 | 0.397 | 0.293 | 0.274 | 0.175 | 90.2 | 19.7 | 9.8 |
McClanahan | Shane | 663556 | 2022 | 0.207 | 0.332 | 0.261 | 0.257 | 0.126 | 87.6 | 8.3 | 6.4 |
Valdez | Framber | 664285 | 2022 | 0.227 | 0.330 | 0.284 | 0.301 | 0.102 | 89.8 | -3.6 | 5.8 |
Urquidy | Jose | 664353 | 2022 | 0.256 | 0.451 | 0.329 | 0.306 | 0.194 | 89.7 | 17.2 | 9.4 |
Manoah | Alek | 666201 | 2022 | 0.224 | 0.343 | 0.284 | 0.290 | 0.118 | 87.5 | 16.9 | 5.4 |
Gallen | Zac | 668678 | 2022 | 0.213 | 0.341 | 0.278 | 0.279 | 0.127 | 87.8 | 10.8 | 7.8 |
Burnes | Corbin | 669203 | 2022 | 0.212 | 0.337 | 0.273 | 0.275 | 0.124 | 87.2 | 10.0 | 5.7 |
Gilbert | Logan | 669302 | 2022 | 0.253 | 0.408 | 0.314 | 0.308 | 0.155 | 91.0 | 14.6 | 7.1 |
Bieber | Shane | 669456 | 2022 | 0.245 | 0.386 | 0.292 | 0.281 | 0.141 | 89.9 | 10.0 | 7.2 |
4.2 baseballr
library(baseballr)
There are four functions specifically for Statcast: statcast_search()
, statcast_search_batters()
, statcast_search_pitchers()
, and statcast_leaderboards()
.
Search
The search functions require a start date, end date, and player as arguments. Similar to the process for FanGraphs, a player’s playerid is in the URL for their Baseball Savant page. It is the string of numbers after their name. For example, https://baseballsavant.mlb.com/savant-player/aaron-judge-592450?stats=statcast-r-hitting-mlb. Aaron Judge’s Statcast playerid (his MLBAM ID) is 592450. Additionally, you could find the playerid through the MLB website. Search for a specific player. The end of the URL will say playerId= and then a number. This is the link to Aaron Judge’s https://www.mlb.com/search?q=aaron%20ju&playerId=592450, which identifies his playerid as 592450.
<- statcast_search(start_date = "2022-06-15", end_date= "2022-07-15", playerid = 592450) judge
This produces pitch-by-pitch data for every game in the designated time frame. There are 92 variables. This data frame includes 27 games, which resulted in 504 rows. We most take into consideration the amount of results created when calling this function.
Important note: the Statcast ID is not the same as the FanGraphs ID!
Leaderboards
There are two required arguments to access a leaderboard: ‘leaderboard’ and ‘year’. The options for ‘leaderboard’ are “exit_velocity_barrels”, “expected_statistics”, “pitch_arsenal”, “outs_above_average”, “directional_oaa”, “catch_probability”, “pop_time”, “sprint_speed”, and “running_splits_90_ft”. There are optional arguments that would help limit the observations produced; a full list can be found in the documentation.
<- statcast_leaderboards(leaderboard = "exit_velocity_barrels", year = 2022)
sc_lead_evb <- statcast_leaderboards(leaderboard = "expected_statistics", year = 2022) sc_lead_exp
year | last_name, first_name | player_id | attempts | avg_hit_angle | anglesweetspotpercent | max_hit_speed | avg_hit_speed | ev50 | fbld | gb | max_distance | avg_distance | avg_hr_distance | ev95plus | ev95percent | barrels | brl_percent | brl_pa |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2022 | Semien, Marcus | 543760 | 547 | 19.9 | 32.4 | 110.1 | 87.3 | 97.8 | 91.3 | 84.2 | 430 | 186 | 394 | 191 | 34.9 | 37 | 6.8 | 5.1 |
2022 | Rosario, Amed | 642708 | 530 | 5.0 | 31.1 | 110.8 | 88.4 | 99.7 | 92.1 | 86.9 | 450 | 138 | 407 | 203 | 38.3 | 24 | 4.5 | 3.6 |
2022 | Ramírez, José | 608070 | 528 | 20.7 | 33.7 | 114.2 | 87.7 | 98.9 | 91.2 | 86.9 | 422 | 179 | 392 | 195 | 36.9 | 35 | 6.6 | 5.1 |
2022 | Turner, Trea | 607208 | 527 | 10.2 | 35.3 | 112.5 | 88.9 | 100.2 | 92.1 | 86.3 | 439 | 163 | 402 | 219 | 41.6 | 40 | 7.6 | 5.6 |
2022 | Guerrero Jr., Vladimir | 665489 | 526 | 4.3 | 27.9 | 118.4 | 92.8 | 105.5 | 98.2 | 90.3 | 467 | 144 | 407 | 265 | 50.4 | 59 | 11.2 | 8.4 |
2022 | Freeman, Freddie | 518692 | 517 | 13.6 | 42.9 | 112.3 | 91.3 | 100.8 | 94.5 | 87.5 | 446 | 185 | 407 | 247 | 47.8 | 51 | 9.9 | 7.2 |
year | last_name, first_name | player_id | pa | bip | ba | est_ba | est_ba_minus_ba_diff | slg | est_slg | est_slg_minus_slg_diff | woba | est_woba | est_woba_minus_woba_diff |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2022 | Semien, Marcus | 543760 | 724 | 547 | 0.248 | 0.243 | 0.005 | 0.429 | 0.394 | 0.035 | 0.317 | 0.306 | 0.011 |
2022 | Freeman, Freddie | 518692 | 708 | 517 | 0.325 | 0.313 | 0.012 | 0.511 | 0.538 | -0.027 | 0.393 | 0.403 | -0.010 |
2022 | Turner, Trea | 607208 | 708 | 527 | 0.298 | 0.276 | 0.022 | 0.466 | 0.432 | 0.034 | 0.350 | 0.335 | 0.015 |
2022 | Lindor, Francisco | 596019 | 706 | 504 | 0.270 | 0.254 | 0.016 | 0.449 | 0.427 | 0.022 | 0.342 | 0.331 | 0.011 |
2022 | Guerrero Jr., Vladimir | 665489 | 706 | 526 | 0.274 | 0.281 | -0.007 | 0.480 | 0.464 | 0.016 | 0.351 | 0.351 | 0.000 |
2022 | Olson, Matt | 621566 | 699 | 450 | 0.240 | 0.248 | -0.008 | 0.477 | 0.467 | 0.010 | 0.344 | 0.347 | -0.003 |