Why do you love that music

- The mystery that has left artists, record labels, or researchers in wonder until now

Join Kamin and Yangyang, the Integrated Design and Management fellows at MIT, in an attempt to explore the pattern behinds worldwide hits like Ed Sheeran's Shape of You or Dua Lipa's New Rules.

For the first time in the history, we have studied people around the world what are they listening to, and how often do they listen to each song.

Dataset

Where do data come from

Spotify Global Top 200 Charts

Dec 23, 2016 - April 20, 2018
3,000 weekly charts
600,000 items
1,000 unique songs

Top 10 Most Streamed Songs

Ed Sheeran and Luis Fonsi are phenomena! They both have two songs on the list. One of the songs is even the same song with different remixes.

Shape of You
Ed Sheeran
Despacito – Remix
Luis Fonsi
New Rules
Dua Lip
Something Just Like This
The Chainsmokers
Despacito
Luis Fonsi
HUMBLE.
Kendrick Lamar
Perfect
Ed Sheeran
Congratulations
Post Malone
I Don't Wanna Live Forever
ZAYN
Mi Gente
J Balvin

Artists With the Most Songs on Charts

XXXTENTACION is the top artist here! From the list, electronic, hip-hop, and dance music definitely own the chart in 2017-2018.

What makes the song soar to the top of the chart?

Looking through audio features

Spotify analyzes and represents each song with 13 audio features that characterize the audio. Some are simple such as the tempo or duration – while some are more complex like how danceable or energetic the song sounds.

All audio features are fetched from Spotify API

To characterize songs, Spotify analyzes every song with 13 audio features. Some of the features, such as the tempo and duration, are intuitive; while the others are more complicated, like danceability and energy. With the data of audio features fetched from Spotify API, we are going to show you our exploration of the patterns behind the songs that you love the most!

Danceability

Audio Feature

Mean

0.679

Max

0.679

Min

0.679

Standard Deviation

0.679

What it is

Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable, and 1.0 is most danceable.

How do they sound

Stay (with Aless...
Dimitri Vegas & Like...
Mean
Drip (feat. Migos)
Cardi B
Max
It's the Most...
Andy Williams
Min

What does it mean

The distribution of hit songs' danceability is left-skewed. Which mean the majority of songs that rock 2017 has high danceability.

Energy

Audio Feature

Mean

0.679

Max

0.679

Min

0.679

Standard Deviation

0.679

What it is

Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

How do they sound

Stay (with Aless...
Twenty One Pilots
Mean
Drip (feat. Migos)
Linkin Park
Max
It's the Most...
XXXTENTACION
Min

What does it mean

The Energy's histogram is slightly more distributed than the Danceability's. Still, people lean towards energetic, fast, loud, and high energy songs.

Key

Audio Feature

What it is

The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on.

What does it mean

We are skeptical about the result as it shows the most popular key is Db (1). I'm as the music produce never produce a song and rarely ever ran into the song in key Db. So, I pick a sample song to check whether the key analysis algorithm is accurate. The sample that I choose is The Dai Dai's หากค่ำคืน (Hak-Kam-Kuen) which is actually my song released in 2016. The song is actually in the key of F, but Spotify API identified as Bb. So I'm skeptical that the data may not be so accurate.

Loudness

Audio Feature

Mean

0.679

Max

0.679

Min

0.679

Standard Deviation

0.679

What it is

The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 dB.

How do they sound

Stay (with Aless...
P!nk
Mean
Drip (feat. Migos)
MC Jhowzinho e MC...
Max
It's the Most...
XXXTENTACION
Min

What does it mean

As we learned the correlation between Loudness and Energy. We can guess that the histogram of Loudness would turn out similarly. Actually, it is even more left-skewed. As my music producer friends used to joke – the more loud, the more popular your song is going to be. So it is sadly true both statistically and from the real experience.

Mode

Audio Feature

What it is

Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.

What does it mean

The songs that made onto the chart are more classified as Major. Although, in this case, Mode is analyzed through only the melodic content in the track. It is still not true that people love happy songs more. As in various cases, minor music could sound happy even if people do not understand the lyrics. And it's even harder to judge if the song is Major or Minor by only the melody in the first place.

Speechiness

Audio Feature

Mean

0.679

Max

0.679

Min

0.679

Standard Deviation

0.679

What it is

Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audiobook, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 illustrate tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.

How do they sound

Stay (with Aless...
NF
Mean
Drip (feat. Migos)
XXXTENTACION
Max
It's the Most...
Ed Sheeran
Min

What does it mean

The majority of the songs have around 0.023-0.236 which is below 0.33. That's mean there's a high probability that they don't contain any rap or spoken words section in the songs. However, I'm skeptical how accurate the audio analyzer is as many hit songs in 2017 actually contain the rap section.

Acousticness

Audio Feature

Mean

0.679

Max

0.679

Min

0.679

Standard Deviation

0.679

What it is

A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

How do they sound

Stay (with Aless...
VÉRITÉ
Mean
Drip (feat. Migos)
Mykola Dmytrovych...
Max
It's the Most...
Ozuna
Min

What does it mean

Not so many acoustic songs made onto the chart in 2017. Sorry!

Instrumentalness

Audio Feature

Mean

0.679

Max

0.679

Min

0.679

Standard Deviation

0.679

What it is

Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal." The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.

How do they sound

Stay (with Aless...
Galantis
Mean
Drip (feat. Migos)
Petit Biscuit
Max
It's the Most...
The Chainsmokers
Min

What does it mean

There's only one hit song that has the instrumentalness slightly higher than 0.5 – Petit Biscuit's Sunset Lover has 0.503 instrumentalness. And it's the electronic house song with a lot of vocoded "ooh" "aah" sound.

Liveness

Audio Feature

Mean

0.679

Max

0.679

Min

0.679

Standard Deviation

0.679

What it is

Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

How do they sound

Stay (with Aless...
P!nk
Mean
Drip (feat. Migos)
Steve Aoki
Max
Drip (feat. Migos)
Bruno Mars
Max

What does it mean

The histogram is heavily right-skewed. Most of the hit songs are not performed live (in the sense of live performance.) And as I looked into the outliner like Steve Aoki's All Night which has 0.919 liveness, it also is not the live performance. So I doubt the accuracy Spotify's algorithm.

Valence

Audio Feature

Mean

0.679

Max

0.679

Min

0.679

Standard Deviation

0.679

What it is

A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

How do they sound

Stay (with Aless...
Adele
Mean
Drip (feat. Migos)
The Beach Boys
Max
It's the Most...
Lil Uzi Vert
Min

What does it mean

The valence is pretty normally distributed here. Both negative and positive songs have their own place.

Tempo

Audio Feature

Mean

0.679

Max

0.679

Min

0.679

Standard Deviation

0.679

What it is

The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

How do they sound

Stay (with Aless...
Jax Jones
Mean
Drip (feat. Migos)
Andy Williams
Max
It's the Most...
Mykola Dmytrovych...
Min

What does it mean

The tempo of hit songs is varied and distributed across the board. However, although the tempo analysis is quite accurate for most songs. I noticed they have difficulty analyzing the song without a clear percussive or drums beat, i.e., when it misclassified Mykola Dmytrovych Leontovych's Carol of the Bells from 6/8 (or 3/4) to 4/4 time signature. That messed up the tempo analysis from around 52 to 46. Another example is Ziv Zaifman's A Million Dreams, again, the analyzer misclassified the time signature from 4/4 to 3/4, messing up from the tempo of 145 to 54. Somehow, the analyzer also messes up with the song with clear beats too with an understandable reason, like when it misclassified J Balvin's Machika from 107 to 212, which is understandable as it's almost double time.

Duration (ms)

Audio Feature

Mean

0.679

Max

0.679

Min

0.679

Standard Deviation

0.679

What it is

The duration of the track in milliseconds.

How do they sound

Stay (with Aless...
Rita Ora
Mean
Drip (feat. Migos)
Ed Sheeran
Max
It's the Most...
Kendrick Lamar
Min

What does it mean

The average duration of hit songs is 214,963 milliseconds or around 3.5 minutes and the standard deviation is around 45 seconds which is typical for most pop songs. Nothing special here.

Time Signature

Audio Feature

What it is

An estimated overall time signature of a track. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure).

What does it mean

Most hit songs have 4/4 time signature which is almost 99% in pop music. However, as I looked into the outliner, Sam Smith's Have Yourself A Merry Little Christmas, which is the only song classified as 1/4 time signature. The song is actually 4/4 but was performed rubato, or in a flexible tempo, slightly speeding up and slowing down to make the music sounds more flowing. This is common in orchestral or acoustic performance between piano and vocal like this song. That is why Spotify analyzer has a hard time correctly identify the time signature of this song.

Discussion

What is next

As we try to understand the correlation between each audio features and popularity, we first plot a scatterplot to see the strength and direction of the linear relationship between two variables. As a result, it does not show the strong indication of linear relationship, the computed coefficients of correlation may don't mean much. Nevertheless, here are top 3 audio features with the highest coefficient of correlation and coefficient of determination (for your amusement.)

Loudness 0.01256

Valence 0.01256

Danceability -0.01256

We were convinced that the dataset may be too small. We would love to compile a bigger dataset in the next step. In a meanwhile, your suggestions and feedbacks are all welcomed!

Scatter plot of Danceability vs. Streams