Natural Sound Geolocation Benchmark

This repo provides benchmark code for audio geolocation on iNatsounds and Xeno-Canto Dawn Chorus (XCDC) datasets, and also includes the official release of the XCDC dataset.

Figure 1: iNatSounds and XCDC predictions from around the world.

iNatSounds

Please see visipedia/inat_sounds for downloading instructions and documentation for iNatSounds. The dataset json files (for train, val and test) include latitude and longitude for each audio, which serve as labels for geolcoation.

The XCDC Dataset

We present the XenoCanto Dawn Chorus (XCDC) Dataset, a new benchmark for audio geolocation using wildlife sounds. Dawn chorus is a symphony of bird songs involving complex sounds. These communal vocal displays occur at dawn and are most noticeable during the breeding season. A rich diversity of species can be heard in a relatively short amount of time, possibly providing the cues needed to narrow down the location of a recording. We construct a new benchmark to test this hypothesis. From XenoCanto, we select audio that:

was recorded during spring-time dawn chorus.
is at least 3 minutes long.
contains at least 10 distinct species.

This yields 576 recordings with corresponding geographic locations and species annotations.

We release all preprocessed audio .wav files and their annotations here. The annotation also contains urls to the original observation in XenoCanto, which can be used to access raw recordings.

Annotation Format

The xcdc_recordings.csv file contains metadata and labels for the dataset. Each row corresponds to one data sample. Below is a description of each column.

Column Name	Description
`audio_id`	Unique ID for each audio recording
`url`	Link to the original audio on Xeno-Canto
`latitude`	Latitude coordinate where the audio was recorded
`longitude`	Longitude coordinate where the audio was recorded
`length`	Duration of the recording (MM:SS or HH:MM:SS)
`date`	Recording date
`time`	Recording time
`recordist`	Name of the person who recorded the audio
`main_scientific`	Scientific name of the primary specie present in the recording
`main_common`	Common name of the primary specie present in the recording
`all_scientific`	List of scientific names of all annotated species in the recording
`all_common`	List of common names of all annotated species in the recording

Note: A Croissant metadata file xcdc_croissant.json is also provided. To integrate it into your machine learning workflow, please refer to Hugging Face Croissant Guide.

Benchmarking Geolocation

To benchmark your method on these two datasets, please create a prediction csv file with the following two fields. See the ground-truth files for XCDC and iNatSounds for examples.

Column Name	Description
`audio_id`	Unique ID for each audio recording. This is also the filename of the audio recording (basename, without the extension)
`latitude`	Predicted Latitude coordinate
`longitude`	Predicted Longitude coordinate

With this csv, run

python3 evaluate_geo.py --pred_cvs <> --dataset <>

where the dataset argument can be xcdc or inat. This should give you a list of different metrics as reported in the paper.

Citation

If you use the dataset and benchmark in your work, please consider citing the following papers:

@article{chasmai2024inaturalist,
    title={The iNaturalist Sounds Dataset},
    author={Chasmai, Mustafa and Shepard, Alexander and Maji, Subhransu and Van Horn, Grant},
    journal={Advances in Neural Information Processing Systems},
    volume={37},
    pages={132524--132544},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
ground_truths		ground_truths
xcdc		xcdc
LICENSE		LICENSE
README.md		README.md
evaluate_geo.py		evaluate_geo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Natural Sound Geolocation Benchmark

iNatSounds

The XCDC Dataset

Annotation Format

Benchmarking Geolocation

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

cvl-umass/nat-sound2loc-benchmark

Folders and files

Latest commit

History

Repository files navigation

Natural Sound Geolocation Benchmark

iNatSounds

The XCDC Dataset

Annotation Format

Benchmarking Geolocation

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages