Getting data from Spotify

Hello data and music lovers! On this post I’m going to share with you how to get data from your favorite artists and their albums from Spotify.

Recently my wife got us a subscription to Spotify and I was wondering if this app with a huge collection of artists and records could be mined. For some time now I’ve been having the desire to make a viz from one of my all-time favorite band (viz coming soon… hopefully) and maybe Spotify data could help me out with this. A quick research on Google throw me the Spotify for Developers page and I found that they have an API to get a wide variety of data about the artists, albums, tracks and even your playlist!

OK, I now I can get the data, but how do I get it? There’s documentation on the page on how to integrate the API to your application but I know nothing about programing APIs or how to interact with them. After asking to my kind network on Twitter I got some guidance but nothing enough clear for me to get the data. Another search on Google and many word combinations got me to an R package called “Spotifyr”, bingo! Now I can try to get the data.

There was a lot of reading and going back and fort to make my code work but at the end I managed to make it work, here you will get my code and the instructions to connect with the Spotify API, is nothing complicated and if you know a little R you could modify it easily to get what you want or give you an idea on how to do it. First you need a Spotify account and go to the developers page here.

It will require you the usual stuff to accept an agreement and confirm, etc. Once in there, you will be on the Dashboard, click on the Create Client ID button.

A new window will pop-up, fill it with the required information, on the “What are you building?” question I selected “I don’t know”, a yellow warning will appear, don’t worry the connection will work, click Next.

Another window will appear, check all the agreement boxes and Submit.

Now you should be on this page and see your Client ID (a bunch of numbers and letters) and your Client Secret (after clicking on “Show Client Secret”), both numbers are required to make the connection work and you will use them in your code.

OK, we are done in the Spotify page, next… to code in R!

GETTING THE DATA WITH R AND THE SPOTIFYR PACKAGE

For this part of the post I will just paste my code in R and you can follow up by reading the comments on my code:

# -- You will need the following packages --
#install.packages('tidyverse')
#install.packages('spotifyr')
# ------------------------------------------

# Loading the packages into R
library(spotifyr)
library(tidyverse)

#setwd("here you should write your computer path to your working directory")

#Code below will help to connect to the Spotify API
Sys.setenv(SPOTIFY_CLIENT_ID = 'your spotify client id goes here')
Sys.setenv(SPOTIFY_CLIENT_SECRET = 'your spotify client secret goes here')

#Getting your access token, will be used on many functions
access_token <- get_spotify_access_token()

#Getting spotify ID for the artist/band you want
artist <- "artist/band you want to extract data"
artist_spotifyID <- get_artist_audio_features(artist)$artist_id[1]

#Getting artist albums data available on spotify and saving it on
#the albums variable

albums <- get_artist_albums(artist_spotifyID, include_groups = c("album"),
limit=50, market = "US", offset = 0,
authorization = access_token,
include_meta_info = FALSE) %>%
#line below adds an id column called "num" to identify each row of data
mutate(num = row_number()) %>%
#line below is used to select the albums you are interested, use the num
#column to filter, if you want all albums just delete or comment the line
#filter(num %in% c(22,20,18,15,14,12,11,8,6,3)) %>%
#line below selects some columns from all the data that the API returns,
#if you want different columns just add their name
select(id, name, release_date, total_tracks)

#Getting tracks from each album and saving it on the tracks variable
tracks <- albums$id %>%
#line below applies a function to get all tracks from each album
#saved on the albums variable
map_dfr(~ get_album_tracks(.x, limit = 30, offset = 0, market = NULL,
authorization = access_token,
include_meta_info = FALSE)) %>%
#line below selects some columns from all the tracks in album data that
#the API returns, if you want different columns just add their name
select(id, track_number, name, duration_ms)

#Getting track info for each track collected by the previous code
#section and saving it in the tracks_info variable

tracks_info <- tracks$id %>%
#line below applies a function to get the info from each track
#saved on the tracks variable
map_dfr(~ get_tracks(.x, market = NULL, authorization = access_token)) %>%
#line below selects some columns from all the columns that
#the API returns, if you want different columns just add their name
select(id, name, popularity, album.id, album.name, album.total_tracks)

#Getting track audio features for each track collected by the previous code
#section and saving it in the tracks_audio_features variable

track_audio_features <- tracks$id %>%
map_dfr(~ get_track_audio_features(.x, authorization = access_token)) %>%
select(-c(type,uri,track_href,analysis_url))

#Saving each variable (dataset) into CSV format
write_csv(albums, "your_file_name.csv")
write_csv(tracks, "your_file_name.csv")
write_csv(tracks_info, "your_file_name.csv")
write_csv(track_audio_features, "your_file_name.csv")

In the code above the lines in orange are lines that require your attention, either you need to add something that it is indicated or you need to remove the hashtag to make the code work in case it is required. The blue lines you don’t need to modify them and the gray lines are comments to guide you on what’s happening.

I decided to separate each information in different datasets to make the code more friendly (my code is not elegant at all and follows my way of thinking which sometimes is not practical at all but I can understand it).

The key part for all the Spotify API functions is the artistID, albumID and trackID which are specific IDs that Spotify uses to retrieve data, if you want to use other functions to extract different data (you can retrieve data from your account and playlists) is very important to get the IDs first.

Once you have all your data you can use it in Tableau and link the different datasets either by the track name, artist name or use the Spotify IDs.

For more details check Spotifyr R package reference page here and Spotify Web API reference page here

If you have any question write me a comment and I will try to help you.

Hasta la proxima!

Leave a Comment

Your email address will not be published. Required fields are marked *