Skip to content

guillemcortes/peaknetfp

Repository files navigation

PeakNetFP

Official repository of PeakNetFP: Peak-based Neural Audio Fingerprinting Robust to Time Stretching by Guillem Cortès-Sebastià, Benjamin Martin, Emilio Molina, Xavier Serra, and Romain Hennequin. To be presented at ISMIR 2025. Preprint is available in arXiv.

arXiv

Getting started

PeakNetFP runs in a docker container. The same container can be used to run PeakNetFP and NeuralFP.

  1. Clone the repository.

  2. Build docker image. Modify the docker-compose.yml if we want to change the image or containers names, for instance.

    docker-compose build
  3. Build and run the container.

    docker-compose up -d
  4. Open an interactive shell

    docker exec -it peaknetfp /bin/bash

Getting the data

To reproduce the experiments in this publication two data sets are needed:

Train and validation

We use the dataset from Neural-Audio-FP (https://github.com/mimbres/neural-audio-fp?tab=readme-ov-file#dataset) for training and validation. The scalability of the approach is tested by adding the tracks of test-dummy-db-100k-full to the reference DB. Hence the importance to download the full neural-audio-fp dataset, available in IEEEDataPort (https://ieee-dataport.org/open-access/neural-audio-fingerprint-dataset). We train with train-10k-30s subset while val-query-db-500-30s is used for validation.

Test

The test dataset is created in this publication from the test set of neural-audio-fp test set and it is available in DOI . The test set contains the tracks of test-query-db-500-30s stretched by the following factors: 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.975, 1 (no stretch), 1.05, 1.1, 1.2, 1.4, 1.6, 1.8, and 2. using the tempo functionality of SOX (https://sourceforge.net/projects/sox/). The downloaded query_stretch folder must be placed inside the test-query-db-500-30s folder of the neural-audio-fp dataset. So the tree structure of the dataset directory should look like this

neural-audio-fp-dataset
├── aug
├── extras
├── LICENSE
├── music
│   ├── LICENSE-fma
│   ├── others
│   ├── test-dummy-db-100k-full
│   ├── test-query-db-500-30s
│   │   ├── db
│   │   ├── query
│   │   ├── query_stretch   --> PeakNetFP stretched test set
│   │   ├── test_ids_icassp2021.npy
│   ├── train-10k-30s
│   └── val-query-db-500-30s
└── README.md

Model weights

Model checkpoints are available in DOI. Checkpoints have to be stored in the root folder of this repo under logs/checkpoint.

Train

  • Check batch-size that fits on your device first.
  • Check the config file and adjust the parameters accordingly.
python run.py train -c config/<config_file.yml>

Generate

example with stretching 0.9

python run.py generate <exp_name> <ckpt_index> \
     -c config/<config_file.yaml> \
     -o logs/emb/<exp_name>/tempo-0_900/<ckpt_index> \
     --query_dir /datasets/neural-audio-fp-dataset/music/test-query-db-500-30s/query_stretch/tempo-0_900

It will generate the following files:

.
└──logs
   └── emb
       └── CHECKPOINT_NAME
           └── CHECKPOINT_INDEX
               ├── db.mm
               ├── db_shape.npy
               ├── dummy_db.mm
               ├── dummy_db_shape.npy
               ├── query.mm
               ├── query_shape.npy
               ├── query_segments.csv
               ├── db_segments.csv
               └── dummy_db_segments.csv 

By default config, generate will generate embeddings (or fingerprints) from 'dummy_db', test_query and test_db. The generated embeddings will be located in logs/emb/<exp_name>/<ckpt_index>/**.mm and **.npy.

  • dummy_db is generated from the test set specified in the config file.
  • In the DATASEL section of config, you can select options for a pair of db and query generation. The default is unseen_icassp, which uses a pre-defined test set.
  • It is possilbe to generate only the db and query pairs by --skip_dummy option. This is a frequently used option to avoid overwriting the most time-consuming dummy_db fingerprints in every experiment.
  • It is also possilbe to generate embeddings (or fingreprints) from your custom source using -s or --source argument during generation.

Evaluate

example with stretching 0.9

  • PeakNetFP performs song-level search, but we keep the segment level search code from NeuralFP for backwards compatibility.
python run.py evaluate <exp_name> <ckpt_index> \
     -c config/<config_file.yaml> \
     --emb_dir logs/emb/neuralfp/tempo-0_900/<ckpt_index> \
     --stretch_factor 0.9

Other

Tensorboard

Run a dedicated docker container to check tensorboard:

docker run --network=host -ti --rm --name tensorboard -v <vol_path>:<vol_path> tensorflow/tensorflow

then run:

tensorboard --logdir <vol_path>/neuralfp/logs/fit --port <port> --bind_all

Notes

Main developments with respect to the original NeuralFP repository:

  • Fix some parts of the code such as dataset generation loader.
  • Correct the data normalization (now each melspec is normalized, before it was normalize at batch level).
  • Upgrade packages to newer versions. This was motivated by the fact that the original code with tensorflow==2.4.1 cannot be run in newer gpu cards, so we upgraded to the last version 2.15.
  • ADD peak spectrogram, gaussian spectrogram transformations to input data.
  • ADD stretching augmentation
  • Include PointNet++ code to build PeakNetFP

Acknowledgements

This research is part of resCUE – Smart system for automatic usage reporting of musical works in audiovisual productions (SAV-20221147) funded by CDTI and the European Union - Next Generation EU, and supported by the Spanish Ministerio de Ciencia, Innovación y Universidades and the Ministerio para la Transformación Digital y de la Función Pública. Furthermore, it has received support from the Industrial Doctorates plan of the Secretaria d’Universitats i Recerca, Departament d’Empresa i Coneixement de la Generalitat de Catalunya, grant agreement No. DI46-2020.

Citation

To be updated once the proceedings are published

G. Cortès-Sebastià, B. Martin, E. Molina, X. Serra, R. Hennequin, “PeakNetFP: Peak-based Neural Audio Fingerprinting Robust to Extreme Time Stretching”, in Proc. of the 26th Int. Society for Music Information Retrieval Conf., Daejeon, South Korea, 2025.

@article{cortes2025peaknetfp,
  title={PeakNetFP: Peak-based Neural Audio Fingerprinting Robust to Extreme Time Stretching},
  author={Cort{\`e}s-Sebasti{\`a}, Guillem and Martin, Benjamin and Molina, Emilio and Serra, Xavier and Hennequin, Romain},
  journal={arXiv preprint arXiv:2506.21086},
  year={2025},
  note={Accepted at ISMIR 2025}
}

About

PeakNetFP: Peak-based Audio Fingerprinting

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published