4 Finding Second Innings Highlights

Next, we make use of the idea of “resources left” in an innings and use this information alongside ball-by-ball commentary to identify 20 key second-innings turning points in the data.

Resource numbers represent the proportion of runs that a team is expected to score over the remainder of a match, given the current overs and wickets lost. For example, teams at the beginning of their 7th over with 3 wickets lost have 0.68 resources. That resource value implies that an average team has 68% of their runs left to score. In other words, we would expect them to have scored 32% of their total runs at this point in the match.

DLS = read.csv("DLS_T20.csv")[,-1]
colnames(DLS) <- paste("Wicket",0:9)
rownames(DLS) <- paste("Over", 0:20)

gamelog$ResourcesLeft <- 0
for (i in seq(1,nrow(gamelog))){
  #Resources Left are determined using interpolation on DLS Table
  if (gamelog$Over[i] < 19) {
    gamelog$ResourcesLeft[i] = DLS[gamelog$Over[i]+1,gamelog$Wickets[i]+1] -  
    (DLS[gamelog$Over[i]+1,gamelog$Wickets[i]+1]- DLS[gamelog$Over[i]+2,gamelog$Wickets[i]+1]) * 
    gamelog$Ball[i]/6
  }
  else gamelog$ResourcesLeft[i] = DLS[20,gamelog$Wickets[i]+1] * (1 - gamelog$Ball[i]/6)
}

gamelog$RunsScored <- 0
gamelog[gamelog$NumOutcome >= 0,"RunsScored"] <- gamelog %>% 
                                              filter(NumOutcome != -1) %>% 
                                              group_by(MatchNo,Inning) %>% 
                                              mutate(runs = cumsum(NumOutcome)) %>% 
                                              pull(runs)
for (i in seq(2,nrow(gamelog))) {
  if (gamelog$RunsScored[i] < gamelog$RunsScored[i-1] & 
      gamelog$Inning[i] == gamelog$Inning[i-1]) {
    
    gamelog$RunsScored[i] = gamelog$RunsScored[i-1]
  }
}

sum(is.na(gamelog$RunsScored))

gamelog$DLSDifferential <- 0
for (match in unique(gamelog$MatchNo)) {
  target_runs = as.integer(subset(home,MatchNo==match,home))+1
  gamelog$DLSDifferential[gamelog$MatchNo==match & gamelog$Inning==2] = 
  gamelog[gamelog$MatchNo==match & gamelog$Inning==2,"RunsScored"] -
  target_runs *(1-gamelog$ResourcesLeft[gamelog$MatchNo==match & gamelog$Inning==2])
}

The DLSDifferential variable tracks the difference between the number of runs scored by the team batting in the second innings at some stage and the number they should have based on their resources left in order to reach their target. That is, it is a measure of how far “ahead of schedule” or “behind schedule” the team batting in the second innings is with respect to reaching their target (i.e. at least one more run than the opposing team).

Next, we incorporate the sentiment of the commentary for each play:

library(syuzhet)

#Replace whitespace indicating pause in speech with period.
gamelog$FullNotes <- sub("  ",".",gamelog$FullNotes) 

#Add up sentiment for each sentence in commentary
gamelog$Sentiment <- sapply(gamelog$FullNotes,function(text){sum(abs(get_sentiment(get_sentences(text))))})

We take a look at the distribution of the sentiment scores:

library(ggplot2)
p <- ggplot(gamelog %>% filter(Inning==2),mapping=aes(y=Sentiment))+geom_boxplot() +
            theme_classic() + scale_x_discrete(breaks=NULL) + ylab("Sentiment Score")
p

There are a lot of outliers; we seek to remedy this by taking the square root of the sentiment score. We also examine the commentary for the highest score, which is in a league of its own:

gamelog$FullNotes[which.max(gamelog$Sentiment)]
## [1] "DJ Bravo pulls off an absolute blinder! He sets off on a celebratory sprint towards long-off. Could that be a game-changer like how he had run out Kohli on his follow through earlier in the tournament? The crowd loving this. This was full and on middle. Watson presses forward and lofts it in the air. It seemed destined to go over the ropes.... until Bravo at long-on leaps off his feet, stretches his right hand and plucks out a superb one-handed catch. He also held his balance, ensuring that the momentum did not take him over. Bravo then breaks into a customary jig before being flanked by Raina. Oh boy, Bravo is a crowd-pleaser. Isn't he?"

Indeed, it’s clear that the commentator got very excited over this play, as the unusually high sentiment score would suggest.

p <- ggplot(gamelog %>% filter(Inning==2),mapping=aes(y=sqrt(Sentiment)))+geom_boxplot() + 
  theme_classic() + scale_x_discrete(breaks=NULL) + ylab("Square Root Sentiment Score")
p

This distribution is less skewed. We will standardize the values to lie within the \([0,1]\) range except for the top score by dividing all values by the value of the second highest score. The reason for this choice is that since the top score is in “a league of its own” in terms of excitement score, it arguably deserves a considerably high score than the rest of the plays. This will serve as the sentiment component of our score for determining the highlight-worthiness of a play.

The resources component will be a variable we will name GroundGained, which is the number of runs the team advanced or regressed relative to the Duckworth-Lewis target on a given play. Given its many outliers, we take its log and divide by 5 (ceiling for the log of ground gained) to obtain a “ground gained score”. Finally we take the average of the sentiment score and the “ground gained” score to obtain a highlight-worthiness score for each play.

stdizor <- sqrt(gamelog$Sentiment[which(gamelog$Sentiment == sort(gamelog$Sentiment,decreasing=TRUE)[2])])
gamelog$stdized_sentiment <- sqrt(gamelog$Sentiment)/stdizor
ground_gained <- c(gamelog$DLSDifferential[1],gamelog$DLSDifferential[2:nrow(gamelog)]-
                                              gamelog$DLSDifferential[1:nrow(gamelog)-1])
ground_gained[gamelog$Over==0 & gamelog$Ball==1 & gamelog$Inning==1] <- 
  gamelog$DLSDifferential[gamelog$Over==0 & gamelog$Ball==1 & gamelog$Inning==1]
ground_gained[which.max(ground_gained)] = gamelog$DLSDifferential[which.max(ground_gained)]

gamelog$ground_gained <- ground_gained
gamelog$ground_gained_score <- 0
gamelog$ground_gained_score[gamelog$Inning==2] <- log(abs(ground_gained[gamelog$Inning==2]))
gamelog$ground_gained_score <- gamelog$ground_gained_score/5
gamelog$highlight_worthiness <- (gamelog$stdized_sentiment + gamelog$ground_gained_score)/2

gamelog$highlight_worthiness <- 0
gamelog$highlight_worthiness[gamelog$Inning==2] = gamelog$highlight_worthiness

Now it’s time to see whether the highlight-worthiness score matches our intuition:

top20 = which(gamelog$highlight_worthiness %in% sort(gamelog$highlight_worthiness,
                                                     decreasing=TRUE)[1:20])
gamelog[top20,c(1:7,15,21,22,24,26:28)]
##        Format MatchNo TeamBowling TeamBatting Inning Over Ball NumOutcome Wickets ResourcesLeft DLSDifferential
## 12803    T20I     146         AUS          WI      2   17    6         -1       7     0.1100000      -34.200000
## 24469    T20I     187         PAK         ENG      2    1    1          6       0     0.9545000        7.905000
## 25000    T20I     228         ENG         PAK      2   13    4          6       6     0.3006667      -17.599333
## 35038    T20I     105         IND          WI      2   18    4          6       3     0.0840000       14.936000
## 37548    T20I     194          NZ         PAK      2    2    1          1       1     0.8895000        4.447000
## 47714    T20I     336          SA         PAK      2   15    1         -1       6     0.2461667       -2.828833
## 57953    T20I     409          SA         AUS      2   18    3          4       7     0.0850000        7.410000
## 59127    T20I     423         ENG          NZ      2   14    6         -1       7     0.2210000      -18.568000
## 59136    T20I     423         ENG          NZ      2   16    2         -1       9     0.0780000      -42.024000
## 61643    T20I     351          SL         PAK      2   10    3         -1       6     0.3905000      -44.604500
## 64835    T20I  200901          SL          WI      2   12    2          6       4     0.4060000        5.290000
## 64849    T20I  200901          SL          WI      2   14    4          0       8     0.1700000      -35.450000
## 69229    T20I     385          WI          BD      2    3    2         -1       2     0.8133333      -15.920000
## 69663    T20I     391          SA         ENG      2    7    3         -1       2     0.6530000        4.641000
## 69876    T20I     387          SL         ENG      2    0    6         -1       1     0.9330000      -12.596000
## 76769     IPL  200917          MI         KKR      2   15    2         -1       8     0.1600000      -62.920000
## 99946     IPL  201167          DC         KXP      2   15    5         -1       7     0.1943333      -58.327667
## 101502    IPL  201174         CSK         RCB      2   19    6          6       8     0.0000000      -59.000000
## 146417    IPL  201552          SH         RCB      2    5    5          6       4     0.6390000       34.626000
## 155673    IPL  201547         CSK          RR      2    9    6         -1       2     0.5440000       -9.048000
##        highlight_worthiness stdized_sentiment ground_gained
## 12803             0.6026042         1.0000000     -2.790000
## 24469             0.5560651         0.7537784      6.000000
## 25000             0.5540127         0.7784989      5.194667
## 35038             0.5577632         0.8164966      4.460000
## 37548             0.5437211         0.7637626     -5.045000
## 47714             0.5420411         0.8118441     -3.900833
## 57953             0.5375591         0.8703883      2.783333
## 59127             0.5553920         0.7124435     -7.328000
## 59136             0.5400701         0.5707518    -12.768000
## 61643             0.6118782         0.7177406    -12.554500
## 64835             0.5789131         0.8572330      4.495000
## 64849             0.6040645         0.6853444    -13.652500
## 69229             0.5411538         0.7124435     -6.355500
## 69663             0.5728767         0.8164966     -5.187667
## 69876             0.5737736         0.7736180     -6.486000
## 76769             0.5455976         0.6215816    -10.465333
## 99946             0.5573914         0.7487363     -6.235333
## 101502            0.5616351         0.8438727      4.043000
## 146417            0.5638159         0.7929615      5.330000
## 155673            0.6206224         1.2029005     -1.211333
for (play in top20) {
  print(gamelog[play,"FullNotes"])
}
## [1] "Steve Smith takes another beauty! Even better than the first one. bowled it on a good length on the off, Morton swung it hard and connected well but Smith was there waiting, it appeared as though it would clear him but he timed his jump perfectly and intercepted the ball, clasping it over his head and maintaining his balance to ensure he didn't step over the ropes, superb stuff"
## [1] "Kieswetter takes full toll! He backs away to leg and slaps a length delivery over cover point for six! It looked ungainly. but it was mighty effective - that sailed over the rope!"
## [1] "well. well. Picks up another full-length delivery and strikes it well clear of long-off. Pressure back on Bopara"
## [1] "Bravooooo! Whaddaaplayaaaaa! Low full toss on the middle. he moved outside leg and hit it gorgeously inside-out over extra cover for the match-winning six. What a stylish finish. Fabulicous shot to win the game. Bravo is hugged by Chanders. The West Indian fans in the crowd jump for joy."
## [1] "well fielded at short fine leg. short of a good length on middle, clipped towards short fine and Mills does superbly stretching to his right"
## [1] "Pakistan are seriously losing it. Bowls this on a good length in the off stump channel. with Maqsood looking to hit leg side, and de Kock completing a stunning take to his right. This sets of even more delirious scenes from the wickets in the previous over. South Africa starting to turn the tide in a big way. Also completes a team hat-trick for South Africa"
## [1] "short again. not a bad ball... but superbly played, ramping four fine of third man! The delivery was right over the top of the stumps but Abbott did terrifically well to use the pace and flick it to the rope, past the diving fielder"
## [1] "full length just outside off. Southee clearing the front leg strikes it sweet as a nut down the ground but straight to mid-off! Well held by Morgan, it was straight to him but hit bloody hard, right off the meat of the bat but Southee picked out Morgan and is that New Zealand's chance gone now?"
## [1] "goes for a reverse-paddle to a slower ball and chips it straight to the keeper! That will do nicely for England. Hiding to nothing really for McClenaghan. going for the spectacular first up, gets this off a top edge and it lobbed up to Bairstow"
## [1] "going round the wicket. with the ball drifting across the batsman, as Bhatti leans forward in defense, but misses the line of the ball, with Sanga doing well to take the ball and stump him in rapid time as they go for the appeal. The third umpire is called in and it does appear that Bhatti was overbalancing with his foot on the crease. The decision is given out and Bhatti have now lost four wickets in seven balls"
## [1] "fired in flat and in the slot. Bravo just steps forward and swings free. Sri Lanka missing a trick here. They are bowling too fast and the WI batsmen are treating them like medium pacers. No need to worry about turn"
## [1] "bowls over the stumps now. and its a pretty good ball for the hat-trick too. Right at the stumps and keeping a shade low too. Probably because the batsman was plying a fuller ball off the back foot. Defended well enough though"
## [1] "it gets worse for Bangladesh! Absolutely brilliantly disguised stunning delivery! Another slower one outside off. another one that cuts a bit after pitching.. The batsman looks to push it to off, but leaves a gap between the bat and pad and sees flashing lights behind him"
## [1] "back of a length. Moeen goes to pull and gets a tickle on it that and it carries low to de Kock who claims the chance nicely down low to his right hand side diving forward, excellent catch and the South Africans are cock-a-hoop, two huge wickets in consecutive balls"
## [1] "double strike! This would have been a pretty decent ball in a Test match. It's full. swinging late, but not too much. Moves a little off the seam as well, drawing Moeen into the shot. He drives, the outside edge is collected and it flies low to second slip. You read that right. Second slip. What a start for Kule."
## [1] "leg stump again. it's all over, Malinga can't go for the hat-trick because Anureet Singh won't bat. This was a perfect yorker, full, fast, low, base of leg stump, Dinda saves the toe, little mercy that"
## [1] "Hattrick for Amit! He is beyond joy. And is mobbed by his team-mates. It was the googly that bounced extra even as it broke in from outside off and Harris was cramped as he went for the cut. They had the second slip/gully for this hat-trick delivery and it flew to him. Good sharp catch. Amit hops in joy"
## [1] "A lot has changed since last year. Lalit Modi is gone. Two new teams have come in. The off-field excesses have been reduced. The on-field excesses have increased. One thing. however, hasn't changed. A virtually unchanged Chennai Super Kings side retains the IPL. MS Dhoni walks on water. Backwards. Blind-folded. On a tight-rope. And oh yeah, Tiwary launches the last ball over extra cover for a six. Too little. Far too late. No one even noticed. The fireworks erupt in Chepauk. Whistle podu!"
## [1] "RCB have won it! Warner has touched the rope while looking to tack a catch on the boundary! Kohli goes mad! He's played an immensely awesome final over. This ball though wasn't the best. it was length and he got under it. Lifts it high into the air, Warner tracks it well and takes the catch. He's celebrating, but as he backpeddles doing so, he trods on the turf"
## [1] "DJ Bravo pulls off an absolute blinder! He sets off on a celebratory sprint towards long-off. Could that be a game-changer like how he had run out Kohli on his follow through earlier in the tournament? The crowd loving this. This was full and on middle. Watson presses forward and lofts it in the air. It seemed destined to go over the ropes.... until Bravo at long-on leaps off his feet, stretches his right hand and plucks out a superb one-handed catch. He also held his balance, ensuring that the momentum did not take him over. Bravo then breaks into a customary jig before being flanked by Raina. Oh boy, Bravo is a crowd-pleaser. Isn't he?"

Indeed, these all look to be like plays worthy of a highlight reel. There are a variety of scenarios contained within these plays as well:

  • 11 Wickets, many earned in outstanding fashion
  • 6 Sixes, several struck to finish the game off
  • A good defensive play (Match 200901)
  • A Four to take the batting team within a few runs of victory (Match 409)
  • The play on which Chennai Super Kings (CSK) claimed the IPL title (Match 201174)

The commentary is generally very lively with high standardized sentiment scores of 0.75 or greater for the most part. Plays with slightly lower sentiment scores are compensated by higher ground gained scores (i.e. they were plays on which the batting team fell behind their target by 10 or more additional runs by incurring a wicket whilst having a high target number of runs). Interestingly enough, 75% of the matches are from the T20I format in spite of there being an even split between T20I and IPL matches in the dataset overall - international cricket looks like it could be more exciting, just like e.g. the World Cup is for soccer.