synthetic_lyrics_detection

This repository provides Python code to reproduce the experiments from the article Synthetic Lyrics Detection Across Languages and Genres, accepted for publication to NAACL 2025 Workshop TrustNLP.

Installation

git clone https://github.com/deezer/synthetic_lyrics_detection.git
cd synthetic_lyrics_detection

Build and Run the Docker Image

Build the Docker image and run it in a container with an interactive bash session.

Note: The current Docker image requires a CUDA-capable GPU.

make build
make run-bash

Data Generation Pipeline

Install Ollama and pull the required models:

curl -fsSL https://ollama.com/install.sh | sh
ollama serve&
ollama pull mistral && ollama pull tinyllama && ollama pull wizardlm2

Run the data generation pipeline:

python3 data_pipeline/run_pipeline.py <input_json_file_with_human_written_lyrics> output/

Note: Replace <input_json_file_with_human_written_lyrics> with the path to your JSON file containing human-written lyrics.

Synthetic Lyrics Detection

Please refer to this repository which contains the detectors and scripts needed to run the experiments.

Paper

Please cite our paper if you use this data or code in your work:

@inproceedings{labrak2024detecting,
  	author    = {Labrak, Yanis  and
               Frohmann, Markus  and
               Meseguer-Brocal, Gabriel  and
               Epure, Elena V.},
  	booktitle = {Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)},
  	editor    = {Cao, Trista  and
               Das, Anubrata  and
               Kumarage, Tharindu  and
               Wan, Yixin  and
               Krishna, Satyapriya  and
               Mehrabi, Ninareh  and
               Dhamala, Jwala  and
               Ramakrishna, Anil  and
               Galystan, Aram  and
               Kumar, Anoop  and
               Gupta, Rahul  and
               Chang, Kai-Wei},
	  isbn		= {979-8-89176-233-6},
	  month     = may,
	  pages     = {524--541},
	  publisher = {Association for Computational Linguistics},
	  title     = {Synthetic Lyrics Detection Across Languages and Genres},
	  url       = {https://aclanthology.org/2025.trustnlp-main.34/},
	  year      = {2025},
  	  address   = {Albuquerque, New Mexico},
	  abstract  = {In recent years, the use of large language models (LLMs) to generate music content, particularly lyrics, has gained in popularity. These advances provide valuable tools for artists and enhance their creative processes, but they also raise concerns about copyright violations, consumer satisfaction, and content spamming. Previous research has explored content detection in various domains. However, no work has focused on the text modality, lyrics, in music. To address this gap, we curated a diverse dataset of real and synthetic lyrics from multiple languages, music genres, and artists. The generation pipeline was validated using both humans and automated methods. We performed a thorough evaluation of existing synthetic text detection approaches on lyrics, a previously unexplored data type. We also investigated methods to adapt the best-performing features to lyrics through unsupervised domain adaptation. Following both music and industrial constraints, we examined how well these approaches generalize across languages, scale with data availability, handle multilingual language content, and perform on novel genres in few-shot settings. Our findings show promising results that could inform policy decisions around AI-generated music and enhance transparency for users.}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data_pipeline		data_pipeline
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

synthetic_lyrics_detection

Installation

Build and Run the Docker Image

Data Generation Pipeline

Synthetic Lyrics Detection

Paper

About

Uh oh!

Releases

Packages

Languages

License

deezer/synthetic_lyrics_detection

Folders and files

Latest commit

History

Repository files navigation

synthetic_lyrics_detection

Installation

Build and Run the Docker Image

Data Generation Pipeline

Synthetic Lyrics Detection

Paper

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages