The final corpus is lyrics-dataset-updated-v2.xlsx
(the reject modernity, embrace tradition version of version control)
The code for creating the corpus (web scraping and lyrics and feature collection) can be found in the dataset collection
folder in the collect_dataset.ipynb
and explore_dataset.ipynb
files.
The code for sentiment analysis and graphs can be found in experiments_pt_3.ipynb
which also links to Colab.
I apologize for the lack of clear comments/markdown. Mayhaps I will fix this at a future date.
For your viewing enjoyment, I have also added experiments.ipynb
and experiments_pt2.ipynb
to this repo. These files contain no code which I ended up using in the project, but are simply me figuring out what I can do with the lyrics and how I want to do it. If you are a brat fan, I especially recommend checking out experiments.ipynb
.
If you're wondering what the brat folder is, I tested out my code on the brat album because it was a much smaller sample size than the Grammy Awards corpus I was planning on collecting
If you're wondering what joining dataset.ipynb is, by the time nominations for 2025 were announced, I had already collected my dataset and I didn't want to run the code from 1960 again because my laptop almost died the first time. I used the brat code I had to generate the same dataset only for 2025 and simply added it to my existing data. Thank you, Charli xcx!