Chapter 2 Lahman

The Lahman package contains several tables with data on pitching, hitting, fielding, and more. This package is updated with each season. The following references the 10.0-1 version.

type of data available: season-by-season, broken down by team for Batting / Pitching (so if a player played on multiple teams in one season, each team has their own row), broken down by team and position for Fielding

The main tables are:

  • People : player names, dates of birth, death, and other biographical info
  • Batting : batting statistics
    • battingStats : calculates batting average (BA) plate appearances (PA), total bases (TB), slugging percentage (SlugPct), on-base percentage (OBP), on-base percentage + slugging (OPS), and batting average on balls in play (BABIP)
  • Pitching : pitching statistics
  • Fielding : fielding statistics
  • Salaries : player name, year, team, league, and salary (from 1985-2016)
  • Teams : overall season statistics
  • Additional tables cover information such as All Star appearances, post season data, managers, Hall of Fame voting, awards, parks, and colleges.

A full list of the tables and their contents can be found in the documentation.

library(Lahman)

2.1 People

Here is an example of the People table. Each row contains data about a player’s birth date, hometown, death date, physical characteristics, first game, last game, and their ID for a few popular databases.

People %>% 
  sample_n(6)
playerID birthYear birthMonth birthDay birthCountry birthState birthCity deathYear deathMonth deathDay deathCountry deathState deathCity nameFirst nameLast nameGiven weight height bats throws debut finalGame retroID bbrefID deathDate birthDate
lepcite01 1929 7 28 USA NY Utica 2019 12 11 USA MA Dedham Ted Lepcio Thaddeus Stanley 177 70 R R 1952-04-15 1961-09-11 lepct101 lepcite01 2019-12-11 1929-07-28
brazecr01 1980 5 10 USA AL Montgomery NA NA NA NA NA NA Craig Brazell Craig Walter 210 75 L R 2004-08-17 2007-09-30 brazc001 brazecr01 NA 1980-05-10
headed01 1918 1 25 USA LA Selma 1980 1 31 USA LA Bastrop Ed Head Edward Marvin 175 73 R R 1940-07-27 1946-08-25 heade101 headed01 1980-01-31 1918-01-25
harrian01 1888 11 13 USA MA Wakefield 1938 11 12 USA MA Malden Andy Harrington Andrew Francis 193 72 R R 1913-09-08 1913-09-08 harra102 harrian01 1938-11-12 1888-11-13
slaugba01 1884 10 6 USA DE Smyrna 1961 5 17 USA PA Philadelphia Barney Slaughter Byron Atkins 165 71 R R 1910-08-09 1910-10-11 slaub101 slaugba01 1961-05-17 1884-10-06
kerfech01 1963 9 28 USA MO Knob Noster NA NA NA NA NA NA Charlie Kerfeld Charles Patrick 225 78 R R 1985-07-27 1990-07-19 kerfc001 kerfech01 NA 1963-09-28

2.2 Batting

Example: José Iglesias

Here is what the Batting table looks like for a single player. Seasons where Iglesias played for multiple teams are highlighted to help show how this table is organized.

iglesias <- Batting %>% 
  filter(playerID == "iglesjo01") %>% 
  arrange(desc(yearID))
playerID yearID stint teamID lgID G AB R H X2B X3B HR RBI SB CS BB SO IBB HBP SH SF GIDP
iglesjo01 2022 1 COL NL 118 439 48 128 30 0 3 47 2 3 17 56 0 8 0 3 11
iglesjo01 2021 1 LAA AL 114 424 57 110 23 1 8 41 5 2 18 66 0 4 0 1 10
iglesjo01 2021 2 BOS AL 23 59 8 21 4 1 1 7 0 0 3 9 0 2 0 0 0
iglesjo01 2020 1 BAL AL 39 142 16 53 17 0 3 24 0 0 3 17 0 4 0 1 1
iglesjo01 2019 1 CIN NL 146 504 62 145 21 3 11 59 6 6 20 70 3 3 1 2 17
iglesjo01 2018 1 DET AL 125 432 43 116 31 3 5 48 15 6 19 47 0 8 3 2 11
iglesjo01 2017 1 DET AL 130 463 56 118 33 1 6 54 7 4 21 65 0 1 3 1 6
iglesjo01 2016 1 DET AL 137 467 57 119 26 0 4 32 7 4 28 50 1 8 7 3 12
iglesjo01 2015 1 DET AL 120 416 44 125 17 3 2 23 11 8 25 44 2 6 4 3 10
iglesjo01 2013 1 BOS AL 63 215 27 71 10 2 1 19 3 1 11 30 0 6 0 2 4
iglesjo01 2013 2 DET AL 46 135 12 35 6 0 2 10 2 1 4 30 0 5 4 0 3
iglesjo01 2012 1 BOS AL 25 68 5 8 2 0 1 2 1 0 4 16 0 3 2 0 2
iglesjo01 2011 1 BOS AL 10 6 3 2 0 0 0 0 0 0 0 2 0 0 0 0 0

battingStats() produces a data frame with the same variables as Batting, and additionally calculates some statistics (as mentioned above). This table is only showing playerID, yearID, teamID, and the variables exclusive to battingStats().

iglesias_stats <- battingStats(iglesias) %>% 
  dplyr::select(playerID, yearID, teamID, BA:BABIP)
playerID yearID teamID BA PA TB SlugPct OBP OPS BABIP
iglesjo01 2022 COL 0.292 467 167 0.380 0.328 0.708 0.326
iglesjo01 2021 LAA 0.259 447 159 0.375 0.295 0.670 0.291
iglesjo01 2021 BOS 0.356 64 30 0.508 0.406 0.914 0.408
iglesjo01 2020 BAL 0.373 150 79 0.556 0.400 0.956 0.407
iglesjo01 2019 CIN 0.288 530 205 0.407 0.318 0.725 0.315
iglesjo01 2018 DET 0.269 464 168 0.389 0.310 0.699 0.291
iglesjo01 2017 DET 0.255 489 171 0.369 0.288 0.657 0.285
iglesjo01 2016 DET 0.255 513 157 0.336 0.306 0.642 0.276
iglesjo01 2015 DET 0.300 454 154 0.370 0.347 0.717 0.330
iglesjo01 2013 BOS 0.330 234 88 0.409 0.376 0.785 0.376
iglesjo01 2013 DET 0.259 148 47 0.348 0.306 0.654 0.320
iglesjo01 2012 BOS 0.118 77 13 0.191 0.200 0.391 0.137
iglesjo01 2011 BOS 0.333 6 2 0.333 0.333 0.666 0.500

2.3 Pitching

Example: Justin Verlander

Just like the Batting table, the Pitching table is split by season, and team if the player pitched for several teams in one year. There are two rows for 2017 because Verlander moved from the Tigers to the Astros.

verlander <- Pitching %>% 
  filter(playerID == "verlaju01") %>% 
  arrange(desc(yearID))
playerID yearID stint teamID lgID W L G GS CG SHO SV IPouts H ER HR BB SO BAOpp ERA IBB WP HBP BK BFP GF R SH SF GIDP
verlaju01 2022 1 HOU AL 18 4 28 28 0 0 0 525 116 34 12 29 185 0.186 1.75 0 3 6 0 666 0 43 1 5 8
verlaju01 2020 1 HOU AL 1 0 1 1 0 0 0 18 3 2 2 1 7 0.150 3.00 0 0 0 0 21 0 2 0 0 1
verlaju01 2019 1 HOU AL 21 6 34 34 2 1 0 669 137 64 36 42 300 0.172 2.58 0 4 6 0 847 0 66 0 2 7
verlaju01 2018 1 HOU AL 16 9 34 34 1 1 0 642 156 60 28 37 290 0.200 2.52 0 5 8 2 833 0 63 2 5 3
verlaju01 2017 1 DET AL 10 8 28 28 0 0 0 516 153 73 23 67 176 0.234 3.82 4 5 3 0 729 0 76 1 4 8
verlaju01 2017 2 HOU AL 5 0 5 5 0 0 0 102 17 4 4 5 43 0.149 1.06 0 0 1 0 120 0 4 0 0 4

2.4 Fielding

Example: DJ LeMahieu

The Fielding table contains some of the basic statistics used for fielders. Notice the last five variables; they all say “NA”. This means that when there is a missing value, the Lahman database fills in “NA” as opposed to leaving the box blank.

Fielding %>% 
  filter(playerID == "lemahdj01") %>% 
  arrange(desc(yearID))
playerID yearID stint teamID lgID POS G GS InnOuts PO A E DP PB WP SB CS ZR
lemahdj01 2022 1 NYA AL 1B 35 31 795 230 13 1 16 NA NA NA NA NA
lemahdj01 2022 1 NYA AL 2B 41 35 938 53 91 2 21 NA NA NA NA NA
lemahdj01 2022 1 NYA AL 3B 47 43 1157 23 106 1 7 NA NA NA NA NA
lemahdj01 2021 1 NYA AL 1B 55 33 963 286 13 1 21 NA NA NA NA NA
lemahdj01 2021 1 NYA AL 2B 83 77 1989 117 156 2 48 NA NA NA NA NA
lemahdj01 2021 1 NYA AL 3B 39 36 897 18 62 6 6 NA NA NA NA NA
lemahdj01 2020 1 NYA AL 1B 11 1 72 24 0 0 1 NA NA NA NA NA
lemahdj01 2020 1 NYA AL 2B 37 34 812 51 82 4 19 NA NA NA NA NA
lemahdj01 2020 1 NYA AL 3B 11 11 261 10 16 2 2 NA NA NA NA NA
lemahdj01 2019 1 NYA AL 1B 40 28 786 215 19 2 24 NA NA NA NA NA
lemahdj01 2019 1 NYA AL 2B 75 66 1739 118 155 2 32 NA NA NA NA NA
lemahdj01 2019 1 NYA AL 3B 52 47 1200 18 87 4 7 NA NA NA NA NA
lemahdj01 2018 1 COL NL 2B 128 127 3345 209 378 4 90 NA NA NA NA NA
lemahdj01 2017 1 COL NL 2B 153 151 3906 251 470 8 106 NA NA NA NA NA
lemahdj01 2016 1 COL NL 2B 146 144 3728 276 422 6 91 NA NA NA NA NA
lemahdj01 2015 1 COL NL 2B 149 146 3852 300 452 9 120 NA NA NA NA NA
lemahdj01 2014 1 COL NL 1B 1 0 3 0 0 0 0 NA NA NA NA NA
lemahdj01 2014 1 COL NL 2B 144 135 3539 257 413 6 99 NA NA NA NA NA
lemahdj01 2014 1 COL NL 3B 7 4 115 2 5 0 0 NA NA NA NA NA
lemahdj01 2014 1 COL NL SS 1 0 3 0 0 0 0 NA NA NA NA NA
lemahdj01 2013 1 COL NL 1B 1 0 3 2 0 0 0 NA NA NA NA NA
lemahdj01 2013 1 COL NL 2B 90 86 2250 168 271 3 57 NA NA NA NA NA
lemahdj01 2013 1 COL NL 3B 14 9 302 6 24 0 2 NA NA NA NA NA
lemahdj01 2013 1 COL NL SS 1 0 3 0 0 0 0 NA NA NA NA NA
lemahdj01 2012 1 COL NL 1B 1 0 9 1 0 0 0 NA NA NA NA NA
lemahdj01 2012 1 COL NL 2B 67 60 1527 105 204 2 33 NA NA NA NA NA
lemahdj01 2012 1 COL NL 3B 9 5 138 2 8 0 0 NA NA NA NA NA
lemahdj01 2012 1 COL NL SS 2 0 6 0 0 0 0 NA NA NA NA NA
lemahdj01 2011 1 CHN NL 1B 1 1 24 8 0 0 1 NA NA NA NA NA
lemahdj01 2011 1 CHN NL 2B 15 8 233 16 22 0 5 NA NA NA NA NA
lemahdj01 2011 1 CHN NL 3B 11 6 180 6 12 4 5 NA NA NA NA NA

2.5 Salaries

The Salaries table is very simple with only five variables. One limitation of it is restricted time frame; the earliest season available is 1985 and the latest season is seven years prior to the current year.

Salaries %>% 
  filter(yearID == 2016) %>% 
  sample_n(15) 
yearID teamID lgID playerID salary
2016 OAK AL hendrli01 523400
2016 CHN NL rosscza01 524500
2016 BAL AL mcfartj01 523500
2016 MIN AL rosared01 542500
2016 LAN NL howeljp01 6250000
2016 CLE AL gomesya01 2583333
2016 ATL NL perezwi01 511250
2016 TEX AL barneto01 1500000
2016 SLN NL siegrke01 539000
2016 SDN NL quackke01 521200
2016 LAN NL guerral01 7500000
2016 TEX AL odorro01 522700
2016 CHN NL solerjo01 3666666
2016 NYN NL harvema01 4325000
2016 HOU AL tuckepr01 515000

2.6 Teams

This is what the Teams table looks like for the 2021 season. It includes 48 variables, covering team identifications, standings, totals for numerous statistics, home ballpark, attendance, park factors, and their ID for a couple other databases.

Teams %>% 
  arrange(desc(yearID)) %>% 
  head(30)
yearID lgID teamID franchID divID Rank G Ghome W L DivWin WCWin LgWin WSWin R AB H X2B X3B HR BB SO SB CS HBP SF RA ER ERA CG SHO SV IPouts HA HRA BBA SOA E DP FP name park attendance BPF PPF teamIDBR teamIDlahman45 teamIDretro
2022 NL ARI ARI W 4 162 81 74 88 N N N N 702 5351 1232 262 24 173 531 1341 104 29 60 50 740 676 4.25 0 10 33 4290 1345 191 504 1216 86 134 0.985 Arizona Diamondbacks Chase Field 1605199 98 99 ARI ARI ARI
2022 NL ATL ATL E 1 162 81 101 61 Y N N N 789 5509 1394 298 11 243 470 1498 87 31 66 36 609 556 3.46 1 9 55 4344 1224 148 500 1554 77 110 0.987 Atlanta Braves SunTrust Park 3129931 102 100 ATL ATL ATL
2022 AL BAL BAL E 4 162 81 83 79 N N N N 674 5429 1281 275 25 171 476 1390 95 31 83 43 688 632 3.97 2 15 46 4300 1406 171 443 1214 91 151 0.985 Baltimore Orioles Oriole Park at Camden Yards 1368367 102 103 BAL BAL BAL
2022 AL BOS BOS E 5 162 81 78 84 N N N N 735 5539 1427 352 12 155 478 1373 52 20 63 50 787 721 4.53 5 10 39 4293 1411 185 526 1346 85 134 0.985 Boston Red Sox Fenway Park II 2625089 108 108 BOS BOS BOS
2022 AL CHA CHW C 2 162 81 81 81 N N N N 686 5611 1435 272 9 149 388 1269 58 10 73 35 717 631 3.92 2 14 48 4343 1330 166 533 1450 102 122 0.982 Chicago White Sox Guaranteed Rate Field 2009359 103 102 CHW CHA CHA
2022 NL CHN CHC C 3 162 81 74 88 N N N N 657 5425 1293 265 31 159 507 1448 111 37 84 36 731 642 4.00 0 11 44 4331 1342 207 540 1383 96 139 0.984 Chicago Cubs Wrigley Field 2616780 100 101 CHC CHN CHN
2022 NL CIN CIN C 4 162 81 62 100 N N N N 648 5380 1264 235 18 156 452 1430 58 33 92 33 815 768 4.86 1 6 31 4270 1366 213 612 1414 81 115 0.986 Cincinnati Reds Great American Ball Park 1395770 110 110 CIN CIN CIN
2022 AL CLE CLE C 1 162 81 92 70 Y N N N 698 5558 1410 273 31 127 450 1122 119 27 81 52 634 560 3.46 1 8 51 4368 1252 172 435 1390 97 127 0.984 Cleveland Guardians Progressive Field 1295870 98 98 CLE CLE CLE
2022 NL COL COL W 5 162 81 68 94 N N N N 698 5540 1408 280 34 149 453 1330 45 20 61 40 873 802 5.06 1 6 43 4276 1516 184 539 1187 100 154 0.983 Colorado Rockies Coors Field 2597428 113 115 COL COL COL
2022 AL DET DET C 4 162 82 66 96 N N N N 557 5378 1240 235 27 110 380 1413 47 24 58 44 713 637 4.04 0 8 38 4259 1336 167 511 1195 94 137 0.984 Detroit Tigers Comerica Park 1575544 96 97 DET DET DET
2022 AL HOU HOU W 1 162 81 106 56 Y N Y Y 737 5409 1341 284 13 214 528 1179 83 22 60 42 518 465 2.90 3 18 53 4336 1121 134 458 1524 72 122 0.987 Houston Astros Minute Maid Park 2688998 101 99 HOU HOU HOU
2022 AL KCA KCR C 5 162 81 65 97 N N N N 640 5437 1327 247 38 138 460 1287 104 34 48 44 810 740 4.70 0 9 33 4248 1493 173 589 1191 82 153 0.986 Kansas City Royals Kauffman Stadium 1277686 103 105 KCR KCA KCA
2022 AL LAA ANA W 3 162 81 73 89 N N N N 623 5423 1265 219 31 190 449 1539 77 27 54 25 668 601 3.77 2 17 38 4307 1241 168 540 1383 84 134 0.985 Los Angeles Angels of Anaheim Angel Stadium of Anaheim 2457461 102 103 LAA ANA ANA
2022 NL LAN LAD W 1 162 81 111 51 Y N N N 847 5526 1418 325 31 212 607 1374 98 18 56 53 513 451 2.80 1 15 43 4354 1114 152 407 1465 83 120 0.986 Los Angeles Dodgers Dodger Stadium 3861408 106 103 LAD LAN LAN
2022 NL MIA FLA E 4 162 81 69 93 N N N N 586 5395 1241 248 20 144 436 1429 122 29 70 36 676 617 3.86 6 10 41 4312 1311 173 511 1437 69 143 0.988 Miami Marlins Marlins Park 907487 99 100 MIA FLO MIA
2022 NL MIL MIL C 2 162 81 86 76 N N N N 725 5417 1271 251 17 219 577 1464 96 30 80 37 688 615 3.83 0 11 52 4338 1238 190 521 1530 91 122 0.984 Milwaukee Brewers Miller Park 2422420 98 97 MIL ML4 MIL
2022 AL MIN MIN C 3 162 81 78 84 N N N N 696 5476 1356 269 18 178 518 1353 38 17 62 46 684 636 3.98 0 17 28 4311 1320 184 468 1336 83 121 0.985 Minnesota Twins Target Field 1801128 98 99 MIN MIN MIN
2022 AL NYA NYY E 1 162 81 99 63 Y N N N 807 5422 1308 225 8 254 620 1391 102 33 70 41 567 532 3.30 1 16 47 4355 1177 157 444 1459 74 102 0.987 New York Yankees Yankee Stadium III 3136207 101 100 NYY NYA NYA
2022 NL NYN NYM E 2 162 81 101 61 N Y N N 772 5489 1422 272 27 171 510 1217 62 22 112 44 606 570 3.57 0 19 41 4316 1274 169 428 1565 67 128 0.988 New York Mets Citi Field 2564737 96 95 NYM NYN NYN
2022 AL OAK OAK W 5 162 80 60 102 N N N N 568 5314 1147 249 15 137 433 1389 78 23 59 33 770 717 4.52 0 7 34 4279 1394 195 503 1203 92 139 0.984 Oakland Athletics O.co Coliseum 787902 96 96 OAK OAK OAK
2022 NL PHI PHI E 3 162 81 87 75 N Y Y N 747 5496 1392 255 29 205 478 1363 105 28 52 44 685 630 3.97 3 15 42 4285 1330 150 463 1423 69 129 0.988 Philadelphia Phillies Citizens Bank Park 2276736 100 100 PHI PHI PHI
2022 NL PIT PIT C 5 162 81 62 100 N N N N 591 5331 1186 221 29 158 476 1497 89 32 54 32 817 735 4.66 0 6 33 4263 1432 164 586 1250 121 152 0.979 Pittsburgh Pirates PNC Park 1257458 100 102 PIT PIT PIT
2022 NL SDN SDP W 2 162 81 89 73 N Y N N 705 5468 1317 275 18 153 574 1327 49 22 65 46 660 611 3.81 0 15 48 4330 1263 173 468 1451 76 116 0.987 San Diego Padres Petco Park 2987470 92 92 SDP SDN SDN
2022 AL SEA SEA W 2 162 81 90 72 N Y N N 690 5375 1236 229 19 197 596 1397 83 27 89 45 623 577 3.59 0 10 40 4341 1277 186 447 1391 69 114 0.988 Seattle Mariners T-Mobile Park 2287267 95 95 SEA SEA SEA
2022 NL SFN SFG W 3 162 81 81 81 N N N N 716 5392 1261 255 18 183 571 1462 64 16 95 53 697 613 3.85 1 8 39 4299 1397 132 441 1370 100 130 0.983 San Francisco Giants Oracle Park 2482686 99 99 SFG SFN SFN
2022 NL SLN STL C 1 162 81 93 69 Y N N N 772 5496 1386 290 21 197 537 1226 95 25 80 45 637 605 3.79 3 17 37 4307 1335 146 489 1177 66 181 0.989 St. Louis Cardinals Busch Stadium III 3320551 94 94 STL SLN SLN
2022 AL TBA TBD E 1 162 81 86 76 N Y N N 666 5412 1294 296 17 139 500 1395 95 37 57 31 614 544 3.41 0 10 44 4307 1260 172 384 1384 84 110 0.985 Tampa Bay Rays Tropicana Field 1128127 95 93 TBR TBA TBA
2022 AL TEX TEX W 4 162 81 68 94 N N N N 707 5478 1308 224 20 198 456 1446 128 41 47 38 743 673 4.22 1 10 37 4305 1345 169 581 1314 96 143 0.984 Texas Rangers Globe Life Field 2011361 100 101 TEX TEX TEX
2022 AL TOR TOR E 2 162 81 92 70 N Y N N 775 5555 1464 307 12 200 500 1242 67 35 55 33 679 620 3.87 0 10 46 4324 1356 180 424 1390 82 120 0.986 Toronto Blue Jays Rogers Centre 2653830 100 100 TOR TOR TOR
2022 NL WAS WSN E 5 162 81 55 107 N N N N 603 5434 1351 252 20 136 442 1221 75 31 60 37 855 785 5.00 2 4 28 4235 1469 244 558 1220 104 126 0.982 Washington Nationals Nationals Park 2026401 94 96 WSN MON WAS

2.7 Visualizations

Here are a few visualizations to help show the data available in the Lahman package.

Batting

Pitching

Fielding

Salaries

Looking Ahead: Many of these tables contain minimal variables that are specific to one part of baseball (such as pitching or fielding). It would be helpful if we could join multiple tables together. We will talk more about creating plots in the Visualizations section.