DRASDIC

This is the public repo for DRASDIC: Domain Randomization for Animal Sound Detection in Context.

Authors: Benjamin Hoffman, David Robinson, Marius Miron, Vittorio Baglione, Daniela Canestrari, Damian Elias, Eva Trapote, Felix Effenberger, Maddie Cusimano, Masato Hagiwara, Olivier Pietquin

Quick links:

Paper (LINK TO COME)
Appendix
Model weights
Fewshot Animal Sound Detection 13 (FASD13) evaluation dataset
Example Data (Train+FASD13)
Colab Demo

Setup

We manage packages with uv. To install with uv, do:

uv pip install -e .

Obtain the model weights and place them in the weights folder.

Inference API

Example usage:

from drasdic.inference.interface import InferenceInterface
from drasdic.inference.inference_utils import load_audio
import pandas as pd

# Initialize interface
interface = InferenceInterface('weights/drasdic_args.yaml')

# Load labeled and unlabeled audio
labeled_audio = load_audio(LABELED_AUDIO_FP)
unlabeled_audio = load_audio(UNLABELED_AUDIO_FP)

# Load selection table for labeled audio
# Interface assumes the loaded selection table has columns:
# "Begin Time (s)", "End Time (s)", and "Annotation"
st = pd.read_csv(LABELED_AUDIO_ST_FP, sep='\t')

# Load support audio and compute features
# This generates one thirty-second prompt per POS event in the support audio
# Inference time scales linearly with the number of prompts
interface.load_support_long(labeled_audio, st, pos_label="POS")

# Subsample support audio prompts to reduce inference time (if desired)
interface.subsample_support_clips(5)

# Predict frame-based logits for unlabeled audio
logits = interface.predict_logits(unlabeled_audio, batch_size=8)

# Convert logits to selection table
predicted_st = interface.logits_to_selection_table(logits, threshold = 0.5)
print(predicted_st)

Further details

See the colab demo for inference with multiple label types and files.

For full details, see documentation in drasdic/inference/interface.py.

Dataset: Fewshot Animal Sound Detection 13 (FASD13)

Obtain the dataset here!

Appendix with more details

Fewshot Bioacoustic Sound Event Detection (FSBSED) describes the task of detecting animal sounds in recordings based on only a handful of examples. It is of interest to researchers in ecology, animal behavior, and machine learning.

A collection of public FSBSED datasets was previously provided in Nolasco et al., 2023 and Liang et al., 2024, but were designated as datasets for model training and validation. We complement these with Fewshot Animal Sound Detection 13 (FASD13), a public benchmark to be used for model evaluation. FASD13 consists of 13 bioacoustics datasets, each of which includes between 2 and 12 audio files. Eleven of these datasets were used from previous studies; they were chosen for their taxonomic diversity, varied recording conditions, and quality of their annotations. Two (CC and JS) are presented here for the first time. All datasets were developed alongside studies of ecology or animal behavior, and represent a range of realistic problems encountered in bioacoustics data.

We follow the data format in Nolasco et al., 2023: Each audio file comes with annotations of the onsets and offsets of positive sound events, i.e. sounds coming from a predetermined category (such as a species label or call type). An N-shot detection system is provided with the audio up through the Nth positive event, and must predict the onsets and offsets of positive events in the rest of the recording. Evaluation of N-shot detection systems is described in loc. cit.

FASD13 Summary

Dataset	Full Name	N files	Dur (hr)	N events	Recording type	Location	Taxa	Detection target
AS	AnuraSet	12	0.20	162	T. PAM	Brazil	Anura	Species
CC	Carrion Crow	10	10.00	2200	On-body	Spain	Corvus corone + Clamator glandarius	Species + Life Stage
GS	Gunshot	7	38.33	85	T. PAM	Gabon	Homo sapiens	Production Mechanism
HA	Hawaiian Birds	12	1.10	628	T. PAM	Hawaii, USA	Aves	Species
HG	Hainan Gibbon	9	72.00	483	T. PAM	Hainan, China	Nomascus hainanus	Species
HW	Humpback Whale	10	2.79	1565	U. PAM	North Pacific Ocean	Megaptera novaeangliae	Species
JS	Jumping Spider	4	0.23	924	Substrate	Laboratory	Habronattus	Sound Type
KD	Katydid	12	2.00	883	T. PAM	Panamá	Tettigoniidae	Species
MS	Marmoset	10	1.67	1369	Laboratory	Laboratory	Callithrix jacchus	Vocalization Type
PM	Powdermill	4	6.42	2032	T. PAM	Pennsylvania, USA	Passeriformes	Species
RG	Ruffed Grouse	2	1.50	34	T. PAM	Pennsylvania, USA	Bonasa umbellus	Species
RS	Rana Sierrae	7	1.87	552	U. PAM	California, USA	Rana sierrae	Species
RW	Right Whale	10	5.00	398	U. PAM	Gulf of St. Lawrence	Eubalaena glacialis	Species

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
drasdic		drasdic
tests		tests
weights		weights
.dict-allowed.txt		.dict-allowed.txt
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
conftest.py		conftest.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DRASDIC

Quick links:

Setup

Inference API

Example usage:

Further details

Dataset: Fewshot Animal Sound Detection 13 (FASD13)

About

Uh oh!

Releases

Packages

Languages

earthspecies/drasdic_api

Folders and files

Latest commit

History

Repository files navigation

DRASDIC

Quick links:

Setup

Inference API

Example usage:

Further details

Dataset: Fewshot Animal Sound Detection 13 (FASD13)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages