Popularity ~ Feature Analysis for INF6027

Investigating quantitative features in music

Introduction

Investigating quantitative features in music:

A statistical analysis of popularity and its relationship to musical features.

Music forecasting is a highly discussed topic within the music industry, as identifying the formula for "success" can give artists and producers a commercial advantage regarding investment opportunities. Despite its importance, it remains as an obstacle in the music industry due to the inherently contextual nature of song popularity.

This report intended to provide additional perspectives to this field of knowledge, through examination of quantitative data present in all audio tracks. Additionally, the report aimed to define what musical features are associated with popular songs and genres. The specific research questions were listed as follows:

Which musical feature most impacts a song’s popularity?
What features distinguish high – and low –popularity genres?
Can genre popularity be predicted?

Sourcing

The dataset was sourced from the Maharshipandya Dataset.

The dataset contains 114,000 tracks with distinct musical features spanning across 114 genres, with each track containing unique numerical features with ranging scales and values.

Packages

The report was carried out using R programming language Version 4.4.1 (2024-06-14 ucrt) and software RStudio Desktop for winOS version 2024.12.0+467. Besides the base R packages, the following packages were also used:

tidyverse
car
corrplot
RColorBrewer
treemapify
ggalt

Composite visualisations

The dataset and code generated composite visualisations depicting the predictive power of models employed throughout analysis. These were visualised to provide a detailed view of musical features and their relationships to popularity across genres:

Univariate popularity ~ feature analysis: Presented relationships between popularity across genres, and their musical features.
Multivariate popularity ~ features analysis: Displayed genre - feature associations to evaluate differences between high/low popularity genres.

Steps to run the code:

1. Prerequisites

Before running the code, ensure the following software is installed:

R(Version 4.0 or later)
RStudio (Integrated Development Environment, IDE)
Git (to clone the repository)

2.Clone the repository and verify branch

Download the project files by cloning the repository. This can be performed by running the following command in your IDE:

git clone https://github.com/G-Imola/Popularity-Feature-Analysis.git

After cloning, verify that the active Git branch is set to main.

To check the branch, run the following command:

git branch

You've done this correctly if the output shows *main.

If the branch is not set to main, you can switch to the main branch by following these steps:

#enter the terminal on your selected IDE and input the command below:

cd Popularity-Feature-Analysis


#Following this, type the code below:

git checkout main


#Finally, test to verify "main" branch has been selected:

git branch

This ensures that you're working on the correct branch for the project.

After following the steps above, your IDE should display the repository, alongside all other data that comes with it!

3. Dataset Placement

Ensure that the Original Dataset is downloaded and placed in the root directory of the cloned repository.

4. Execute the script

Open R file.R (link) in Rstudio, or your preferred IDE, and run the script sequentially to generate:

Descriptive box plots
Correlation plots
Composite scatter-plots of models
Additional descriptive plots

5.Outputs

Generated visualisations are all stored in the plots directory.

The outputs include:

Faceted univariate plot.png
Faceted MVR plot.png
Popularity-genre Box plot.png
Corrplot full data(grouped).png
Corrplot full data(ungrouped).png

Additionally, generated .csv files are stored in the csv ouputs folder, which contain various outputs from the dataset, including:

MVR test residuals.csv
Full MVR Model output.csv
vif scores for training data.csv
Test-dataset predicted Avg.csv
and more!

key findings

Primary predictors of popularity:
- High popularity tracks presented high danceability, loudness, low instrumentalness and liveness.
  - Suggests high popularity tracks are highly loud danceable tracks with little instruments and studio-produced.
- low popularity tracks presented high accousticness, and low significant values for valence,
  - Suggests low popularity songs are primarily acoustic, slower songs with less rhythm and tempo.
Linear modelling provides high predictive precision, however displays low explanatory power (R^2 = 7%).
- This suggests that popularity is primarily determined through non non-musical drivers, i.e marketing and trends.
Genre, as a category, masks features present when displayed amongst aggregated data
- This implies that smaller samples with less genres allow for greater predictability, but poses the risk of skewed/biased data.

License

This project is licensed under CC BY 4.0

See license file for more details.

Contacts

For queries or further information, please contact:

Name: Gianmarco Imola Email: gian.imola2003@gmail.com GitHub: https://github.com/G-Imola

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
Excel sheets		Excel sheets
plots		plots
LICENSE		LICENSE
Original Dataset		Original Dataset
R file.R		R file.R
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Popularity ~ Feature Analysis for INF6027

Investigating quantitative features in music

Introduction

Investigating quantitative features in music:

A statistical analysis of popularity and its relationship to musical features.

Sourcing

Packages

Composite visualisations

Steps to run the code:

1. Prerequisites

2.Clone the repository and verify branch

3. Dataset Placement

4. Execute the script

5.Outputs

key findings

License

Contacts

About

Uh oh!

Releases

Packages

Languages

License

G-Imola/Popularity-Feature-Analysis

Folders and files

Latest commit

History

Repository files navigation

Popularity ~ Feature Analysis for INF6027

Investigating quantitative features in music

Introduction

Investigating quantitative features in music:

A statistical analysis of popularity and its relationship to musical features.

Sourcing

Packages

Composite visualisations

Steps to run the code:

1. Prerequisites

2.Clone the repository and verify branch

3. Dataset Placement

4. Execute the script

5.Outputs

key findings

License

Contacts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages