24 Volleyball

24.1 UW Women’s Volleyball

The University of Wisconsin—Madison has a highly successful women’s volleyball team. The team has won the Big 10 title six times, most recently during the 2019 season. While they have never won a national championship, they have excelled in the post season and have been national runner-ups twice.

24.2 Volleyball Basics

In collegiate volleyball, teams play a series of sets. The first team to win three sets wins the match. The first four sets (the fourth played only if necessary) are played to 25 points, but the winner needs to win by two. Hence, if the score is tied 24-24, then a team will need more than 25 points to win the set. The fifth set is only played until 15 points, but also must be won by two points. The score of a match is the number of sets each team wins, and will be one of 3-0, 3-1, or 3-2.

During a set, for each point, one team serves the ball and the other team receives. If the serving teams wins the point, they continue serving. If the receiving team wins the point, they get to serve the next point. There are many rules about rotations, positions, legal ways to hit the ball, and so on that are unimportant for our analysis.

24.2.1 Volleyball Competition

The 332 division I teams are partitioned into 32 conferences. Wisconsin is one of 14 teams in the Big 10 conference (at some point in the past, there were only ten Big 10 teams, which explains the name). Teams in the same conference play many matches against one another, but also compete in non-conference matches against other teams. The winner of each conference qualifies for the 64-team NCAA tournament as do several other good teams selected by a committee. The tournament is a single-elimination competition where the top 16 teams are seeded and all 64 teams placed into a large bracket. Teams are paired in each round with winners advancing until a champion is determined.

24.3 2019 Season Data

In our analysis, we will examine two different data sets, each from the 332 NCAA Division I women’s volleyball teams. (NCAA is the National Collegiate Athletic Association which administers US collegiate athletic competitions.) One data set has team statistics for each of the teams. This data is in the file volleyball-team-2019.csv. The second data set contains results from almost every match played between two division I teams, either in the regular season or during a conference tournament, and is in the file vb-division1-2019-all-matches-updated.csv. (There are additional volleyball divisions which are less competitive. We eliminate matches between division I teams and teams from other divisions.)

24.3.1 Volleyball Team Season Statistics

Here is a summary of match statistics for each team and descriptions. Note that most of the statistics which are total counts will be more meaningful if they are translated into counts per set. For example, the variables Kills is the total number of kills a team recorded during the season. Teams with high numbers of kills are good, because these teams score points when they spike the ball into the opponents court. But very good teams may play a smaller number of sets per match than average teams if the very good teams sweep their opponents often and win many matches in three sets where an average team may have more matches that go four or five sets, resulting in larger total counts.

statistic description
Team The name of the college or university
Conference The name of the conference
W Number of wins
L Number of losses
Win_pct Winning percentate
Sets Number of sets played
Aces Number of aces (winning a point directly off of a serve)
Assists Number of assists (passing a ball to a teammate who then wins the point)
Block Solos Number of individual blocks (blocking the ball at the net)
Block Assists Number of blocks with two or more defenders
Digs Playing a ball after an attack to prevent it from hitting the floor
Kills Hitting a ball, directly leading to a point
Errors Hitting a ball out of play
Total Attacks Hitting a ball to the other side
Hit_pct (Kills - Errors)/(Total Attack)
Opp Kills Kills by the opponent
Opp Errors Errors by the opponent
Opp Attacks Attacks by the opponent
Opp_pct Opponent hitting percentage

24.3.2 2019 Division I Match Statistics

The second data set has a single row for each 2019 women’s volleyball match played prior to the start of the NCAA tournament where both teams were in the NCAA Division I. There were nearly 5000 such matches. A simple summary of a match lists the points that each team scored in each set. If a match lasts only three or four sets, there will be missing data in sets that did not get played.

Most volleyball matches occurred at one school’s home court. For these matches, the site is missing (NA) and the second team is the home team. The matches that list a site are played at a neutral court. The following table has more detailed descriptions of the variables.

name description
date the date
team1 team 1
team2 team 2
site location of the match if at a neutral site (team 2 is the home team when missing)
s1_1 team 1 set 1 score
s1_2 team 1 set 2 score
s1_3 team 1 set 3 score
s1_4 team 1 set 4 score
s1_5 team 1 set 5 score
sets_1 number of sets won by team 1
s2_1 team 2 set 1 score
s2_2 team 2 set 2 score
s2_3 team 2 set 3 score
s2_4 team 2 set 4 score
s2_5 team 2 set 5 score
sets_2 number of sets won by team 2
winner name of winning team
loser name of losing team
attendance number of people in attendance

24.3.3 Volleyball Data Source

Both data sets were obtained from the NCAA statistics portal, stats.ncaa.org. This web site has statistics for multiple NCAA sports for multiple seasons. The team statistics were obtained by navigating to season final statistics, downloading a separate data set for each summary statistic, and then transforming and joining the data. A screen shot of the web site looks like this.

The match statistics were acquired by scraping the site with code from the rvest package and then using additional specialized functions to extract the match scores from the raw html. HTML was acquired for each date of the year with one or more matches. Partial data from a single date looks like this.

There are a number of errors in this data. Approximately 50 of the matches have posted set scores that are impossible, such as the winning set score being greater than 25 but not exactly two points higher than the opponent score that set, or a winning score in one of the first four sets having fewer than 25 points. We made a number of automatic adjustments to correct these errors. There are an additional number of errors where the posted set scores do not match the values from other sources of information. These are more difficult to detect and correct. It is likely that the error rate is just over one percent and most errors involve relatively minor adjustments to the points scored by each team in places where there are errors. These discrepancies will have minimal effect on our subsequent analyses.

24.4 Volleyball Questions

There are a number of questions we might ask. In particular, we will be interested in finding variables that might predict the winning percentage of a team and modeling the probabilities of which team might win a match given the histories of their previous matches that season. The next couple chapters will develop these modeling ideas.