Children's Songs Recommendation

Description

The project consists of five parts:

\flask-song-app\ A web application that provides recommendation for children's songs. Users can select a song, and the app will recommend similar songs. Deployed to Heroku: https://children-song-app.herokuapp.com/
\notebooks\1_* Data scrapping and cleaning.
\notebooks\2_* Investigation of the relation between age-ratings and audio features. Used NLP to understand the lyrics.
\notebooks\3_* Age-Rating Models using audio features and lyrics.
\notebooks\4_* Song Recommender.

2206 albums with age-ratings scrapped from Common Sense Media, a non-profit whose mission is to ensure digital well-being for kids by providing expert reviews. See more info on how music is rated.
Use Spotify API to obtain song tracks in each album. In total, 18K songs along with audio features and ISRC codes are founded.
Use MusixMatch API with ISRC codes to obtain lyrics for 12K songs.

Two Age-Rating Models:
- Use audio features to predict age ratings.
  
  A tree regression model uses 13 audio features (key, tempo, duration, etc, explained here) and popularity to predict age-ratings. The model achieves an R^2 score of 0.50, having popularity and duration as the two most important features.
- Use song lyrics to predict age ratings.
  
  After basic text preprocessing (tokenization, lemmatization, removing stop words), the processed lyrics are then feed into a model pipeline consisting of TfIdfVectorizer and RidgeRegressor. GridSearchCV is used on a smaller subset to select the paramters: min_df, max_df for TfIdfVectorizer, and alpha for RidgeRegressor. The parameters for TfIdfVectorizer will be used later for lyrics-based song recommendation with KNN model (where there is no metric to tune parameters.)
  
  The model achieves an R^2^ score of 0.4.
Song Recommendation K-Nearest Neighborhood model using the follow features:
- Audio features: key, mode, time_signature, duration_ms, danceability, energy, loudness, speechiness, acousticness, instrumentalness, liveness valence, tempo. Explained here.
- Song popularity: A Number between 0-100 computed based on the total number of plays the track has had and how recent those plays are. This number is provided by Spotify API.
- Age rating of the album including the song track.
- Song lyrics.

What's the relation between age-ratings and audio features?
- Age-rating is most correlated to popularity and duration (popularity measures how many users have played the track, and duration is the time length of the song). (See correlation plot or 2.2_Variables_Relation.ipynb)
- Melodic modes (major vs minor): In the age group of 2-5, more than 80% of the songs are in major keys, while the age group of 13-18 have only 65% in major keys. (See plot or 2.2_Variables_Relation.ipynb)
What's the relation between age-ratings and lyrics?
- Visualized by plotting word-polarity: Divide the lyrics into two age-groups: young vs old, and use the conditional word log-probabilities as the (x,y)-coordinate. In the plot, neutral words will approximately lie on the line x=y. (See plot or 2.5_NLP_Visualize_Lyrics_Word_Polarity.ipynb)
What are the albums sing about?
- LDA topic modeling is used to define 10 topics among all lyrics. Each topic is described by its topic keywords. (See 2.4_NLP_Topic_Modeling_Using_Song_Lyrics.ipynb)

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
data		data
figures		figures
flask-hello-app		flask-hello-app
flask-song-app		flask-song-app
models		models
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md