A Longitudinal Analysis of Spotify’s Top 100 Tracks

Delaney Beal

November 20, 2020

Introduction

Goal: to determine which factors influence the overall danceability of a song over time

Statement of Problem

Methods

Spotify for Developers

Spotify Web API

Methods

Spotipy

Previous Uses

Coding Process

Data Extraction Process

An app needed to be created through the Spotify Dashboard

Data Extraction Process


   

Variables

Outcome Variable

Predictor Variables

Variables

Predictor Variables

Previous Research

Statistical Methods

\[\hat{Y} = \beta_0 + \beta_k{X_k} \]

Statistical Methods

From previous research, energy, speechiness, acousticness, valence, and tempo are significant predictors of danceability.

Initial Model

The initial regression model including all variables is: \[\hat{Y} = -7.0358 + 0.0039\mbox{ year} -0.2561\mbox{ energy} + 0.0003\mbox{ key}\] \[+ 0.001\mbox{ loudness} -0.0178\mbox{ mode} + 0.1798\mbox{ speechiness} -0.1227\mbox{ acousticness}\] \[+ 0.119\mbox{ instumentalness} -0.0928\mbox{ liveness} + 0.3141\mbox{ valence} -0.0009\mbox{ tempo}\] \[ + 0\mbox{ duration} + 0.0185\mbox{ time signature}\]

Characteristic Beta 95% CI1 p-value
year 0.00 0.00, 0.00 <0.001
energy -0.26 -0.31, -0.20 <0.001
key 0.00 0.00, 0.00 0.720
loudness 0.00 0.00, 0.00 0.610
mode -0.02 -0.03, -0.01 0.001
speechiness 0.18 0.12, 0.24 <0.001
acousticness -0.12 -0.15, -0.09 <0.001
instrumentalness 0.12 0.06, 0.18 <0.001
liveness -0.09 -0.13, -0.05 <0.001
valence 0.31 0.29, 0.34 <0.001
tempo 0.00 0.00, 0.00 <0.001
duration_ms 0.00 0.00, 0.00 0.614
time_signature 0.02 0.00, 0.04 0.106

1 CI = Confidence Interval

Final Model

After removing the variables that are not significant predictors of danceability, the regression model is: \[\hat{Y} = -16.8717 + 0.0088\mbox{ year} -0.2426\mbox{ energy} -0.017\mbox{ mode} + 23.877\mbox{ speechiness}\] \[ -0.1224\mbox{ acousticness} + 0.1145\mbox{ instumentalness} -0.0946\mbox{ liveness} + 13.9302\mbox{ valence}\] \[ -0.0008\mbox{ tempo} -0.0118\mbox{ year} \times \mbox{ speechiness} -0.0068\mbox{ year } \times \mbox{ valence}\]

Interactions between year and speechiness and year and valence.


Characteristic Beta 95% CI1 p-value
year 0.01 0.01, 0.01 <0.001
energy -0.24 -0.28, -0.20 <0.001
mode -0.02 -0.03, -0.01 0.002
speechiness 24 3.7, 44 0.021
acousticness -0.12 -0.15, -0.09 <0.001
instrumentalness 0.11 0.05, 0.17 <0.001
liveness -0.09 -0.14, -0.05 <0.001
valence 14 5.1, 23 0.002
tempo 0.00 0.00, 0.00 <0.001
year * speechiness -0.01 -0.02, 0.00 0.022
year * valence -0.01 -0.01, 0.00 0.002

1 CI = Confidence Interval

Testing Assumptions

Graphical Analysis

Interaction between Year and Speechiness

Interaction between Year and Valence

Conclusions

Impact

The datasets created for this project have been uploaded to Kaggle for future analysts to utilize.

Suggestions for Further Study

A Longitudinal Analysis of Spotify’s Top 100 Tracks