Predicting Beats Per Minute in Songs

This repository contains the code and data for the Kaggle Playground Series (Season 4, Episode 2) competition. The primary goal of this project is to predict the Beats Per Minute (BPM) of a song based on various musical and audio features.

Project Overview

The core of this project is a regression task where we are given a set of audio features for various songs and are asked to build a model that can accurately predict the BeatsPerMinute. The workflow includes exploratory data analysis, data preprocessing to handle skewed data, and the implementation of various regression models. The main analysis and model development are performed in the notebook.ipynb Jupyter Notebook.

Dataset

The data for this competition is provided in two files:

datasets/train.csv:- The training set, which includes all the audio features as well as the target variable, BeatsPerMinute.
datasets/test.csv::- The test set, which contains the same features as the training set but without the target variable.

Features

The dataset includes the following audio features:

RhythmScore
AudioLoudness
VocalContent
AcousticQuality
InstrumentalScore
LivePerformanceLikelihood
MoodScore
TrackDurationMs
Energy

Methodology

The approach taken in notebook.ipynb follows a standard machine learning workflow:

Exploratory Data Analysis (EDA):

The data is loaded and inspected for missing values and data types. The distributions of all numerical features are visualized using histograms to identify skewness.

Data Preprocessing:

Log Transformation: To handle right-skewed features (VocalContent, AcousticQuality, InstrumentalScore, LivePerformanceLikelihood), a log transformation (np.log1p) is applied to make their distributions more normal.
Train-Validation Split: The training data is split into a training set and a validation set to evaluate model performance accurately and prevent data leakage.
Feature Scaling: StandardScaler (Z-score scaling) is used to scale all features. The scaler is fitted only on the training data and then used to transform both the training and validation sets.
Modeling: A baseline model using Linear Regression was initially built. The project plan includes experimenting with more advanced regression models such as SVR, LightGBM, XGBoost, and Neural Networks to capture more complex, non-linear patterns in the data.

Repository Structure:

 ├── datasets
 │   ├── train.csv
 │   └── test.csv
 ├── notebook.ipynb
 ├── submission.csv
 └── README.md

datasets/: Contains the raw training and test data.
notebook.ipynb: The main Jupyter Notebook with all the analysis and modeling code.
submission.csv: A sample submission file in the format required by Kaggle.
README.md: This file, providing an overview of the project.

How to Use

To run this project:

Clone the repository:

 git clone https://github.com/smusab9152/bpm_pred_songs.git
 cd bpm_pred_songs

Install dependencies:

It is recommended to use a virtual environment.(conda)

pip install pandas numpy scikit-learn matplotlib jupyter

Run the Jupyter Notebook:

jupyter notebook`

Then, open and run the cells in notebook.ipynb.

Reference

Kaggle Competition Link

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.ipynb_checkpoints		.ipynb_checkpoints
datasets		datasets
README.md		README.md
notebook.ipynb		notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting Beats Per Minute in Songs

Project Overview

Dataset

Features

Methodology

Exploratory Data Analysis (EDA):

Data Preprocessing:

Repository Structure:

How to Use

To run this project:

Clone the repository:

Install dependencies:

Run the Jupyter Notebook:

Reference

Python 3.8.0

About

Uh oh!

Releases

Packages

Languages

smusab9152/bpm_pred_songs

Folders and files

Latest commit

History

Repository files navigation

Predicting Beats Per Minute in Songs

Project Overview

Dataset

Features

Methodology

Exploratory Data Analysis (EDA):

Data Preprocessing:

Repository Structure:

How to Use

To run this project:

Clone the repository:

Install dependencies:

Run the Jupyter Notebook:

Reference

Python 3.8.0

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages