library("ggbump")
library("Lahman")
library("tidyverse")
Today I want to try to make a bump
plot while practicing with sports data. The Lahman
data set has a lot of historical data about Major League Baseball. Data scientists have been using bump plots for a few years now, but currently I wish to credit this code by Albert Rapp.
For today’s easy foray, let us seek out the wins and losses of teams in the Teams
data frame (I tend to call my data frames df
for typing ease).
<- Teams df
There are about 3000 observations and 48 variables. I will need some of the column names.
colnames(df)
[1] "yearID" "lgID" "teamID" "franchID"
[5] "divID" "Rank" "G" "Ghome"
[9] "W" "L" "DivWin" "WCWin"
[13] "LgWin" "WSWin" "R" "AB"
[17] "H" "X2B" "X3B" "HR"
[21] "BB" "SO" "SB" "CS"
[25] "HBP" "SF" "RA" "ER"
[29] "ERA" "CG" "SHO" "SV"
[33] "IPouts" "HA" "HRA" "BBA"
[37] "SOA" "E" "DP" "FP"
[41] "name" "park" "attendance" "BPF"
[45] "PPF" "teamIDBR" "teamIDlahman45" "teamIDretro"
To make a quick exploration, let us filter
for the past 10 seasons of baseball (2012 to 2021) and select
the columns I will use later.
<- Teams |>
df filter(yearID >= 2012) |>
select(yearID, lgID, franchID, divID, Rank)
head(df)
yearID lgID franchID divID Rank
1 2012 NL ARI W 3
2 2012 NL ATL E 2
3 2012 AL BAL E 2
4 2012 AL BOS E 5
5 2012 AL CHW C 2
6 2012 NL CHC C 5
To be honest, I thought I was going to have to code up some function to rank team wins within the MLB divisions, but the Lahman
database already has that!
<- df |> filter(yearID == 2012 & lgID == "NL")
df_left <- df |> filter(yearID == 2021 & lgID == "NL") df_right
|>
df filter(lgID == "NL") |>
ggplot(aes(x = yearID, y = -Rank, color = franchID)) +
geom_bump(size = 2) +
geom_point(aes(x = yearID, y = -Rank, color = franchID),
size = 5) +
geom_label(aes(x = yearID, y = -Rank, label = franchID), data = df_left) +
geom_label(aes(x = yearID, y = -Rank, label = franchID), data = df_right) +
facet_wrap(. ~ divID, ncol = 1) +
labs(title = "National League Standings",
subtitle = "early draft of bump plot",
caption = "Derek Sollberger") +
theme(legend.position = "none",
panel.background = element_blank())
Warning in f(...): 'StatBump' needs at least two observations per group