dcase2025_task4_evaluator

The dcase2025_task4_evaluator is a set of scripts for calculating the ranking metric and other informative metrics used to analyze system performance on the evaluation dataset, using the corresponding ground truth, as part of the DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes.

https://dcase.community/challenge2025/task-spatial-semantic-segmentation-of-sound-scenes

1. Description

Directory structure

dcase2025_task4_evaluator
   ├─docker
   ├─data
   │  ├─eval_set
   │  │  ├─oracle_target
   │  │  │  ├─eval_0000_Buzzer.wav
   │  │  │  ├─...
   │  │  │  └─eval_2249_Speech.wav
   │  │  └─soundscape
   │  │     ├─ eval_0000.wav
   │  │     ├─ ...
   │  │     └─ eval_2289.wav
   │  └─t4_submissions
   │     ├─<system_name_1>
   │     │  ├─eval_out
   │     │  │  ├─ eval_0000_<estimated_class_name>
   │     │  │  ├─ ...
   │     │  │  └─ eval_2289_<estimatded_class_name>
   │     │  └─<system_name_1>.meta.yaml
   │     ├─<system_name_2>
   │     :  ├─eval_out
   │     :  │  ├─eval_0000_<estimated_class_name>
   │     :  │  ├─...
   │     :  │  └─eval_2289_<estimated_class_name>
   │     :  └─<system_name_2>.meta.yaml
   │     └─<system_name_n>
   │        ├─eval_out
   │        │  ├─eval_0000_<estimated_class_name>
   │        │  ├─...
   │        │  └─eval_2289_<estimated_class_name>
   │        └─<system_name_n>.meta.yaml
   │
   ├─results
   │  ├─data
   │  │  └─t4_submission
   │  ├─ranking_results
   │  └─ranking_results_latex
   ├─tools
   │  ├─scripts
   │  │  ├─calc_non_speech_peaq.sh
   │  │  ├─calc_speech_pesq_stoi_waveform.sh
   │  │  ├─check_audio.sh
   │  │  ├─install_peaq.sh
   │  │  ├─summarize_scores.sh
   │  │  ├─convert_csv_to_latex.sh
   │  │  ├─calc_classification_separation_scores.sh
   │  │  ├─PEAQ_python.diff
   │  │  └─summarize_estimate.sh
   │  ├─calc_non_speech_peaq.py
   │  ├─calc_speech_pesq_stoi.py
   │  ├─check_audio.py
   │  ├─summarize_scores.py
   │  ├─convert_csv_to_latex.py
   │  ├─dataset_s5_waveform.py
   │  ├─calc_classification_separation_scores.py
   │  ├─plot_common.py
   │  ├─read_submission_yaml.py
   │  ├─generate_task4_entries_yaml.py
   │  ├─summarize_estimate.py
   │  └─utils.py
   ├─README.md
   ├─ground_truth_zenodo.txt
   ├─evaluation_system_ranking.sh
   └─evaluation_all_score.sh

The data/eval_set/soundscape directory contains pre-mixed soundscapes released at the evaluation stage of the DCASE 2025 Challenge Task 4, available at Zenodo: DCASE2025Task4EvaluationDataset. The data/eval_set/oracle_target directory includes the corresponding oracle target sources, which can be downloaded from Zenodo: Ground truth for DCASE2025Task4EvaluationDataset. To conduct evaluations, participants should place their system outputs in the data/eval_set/t4_submissions folder. For details on the required folder structure and file naming conventions for system outputs, please refer to the DCASE 2025 Challenge Task 4 baseline repository.

The docker folder contains scripts and configuration for building and running the project's environment. tools is the main utility directory containing various scripts for processing, evaluation, and data handling tasks, while tools/scripts includes shell scripts for automating workflows and installations.

The evaluation can be performed by running evaluation_system_ranking.sh or evaluation_all_score.sh from the main directory, with the results saved in the results folder.

DCASE2025 Task4 Evaluation Dataset

The DCASE 2025 Task 4 Evaluation Dataset consists of 2,290 files, numbered from 0000 to 2289. It includes a main subset, ranging from 0000 to 1619, which is used for system evaluation, while the remaining files are intended for informative analysis. Details are provided below.

Main Subset
- 0000 - 1619 (1620) : Used to calculate ranking scores in the DCASE2025 Challenge Task 4
Partially known conditions
- 1620 - 1709 (90) : Known IR (Synthesized using the RIRs included in the train split of the DCASE2025 Task4 Dataset (Development set))
- 1710 - 1817 (108) : Known target sound event (Synthesized using target sound events included in the train split of the DCASE2025 Task4 Dataset (Development set).)
- 1818 - 1925 (108) : Known background noise (Synthesized using backgroun noise included in the train split of the DCASE2025 Task4 Dataset (Development set).)
- 1926 - 2033 (108) : Known interference sound event (Synthesized using interference sound event included in the train split of the DCASE2025 Task4 Dataset (Development set).)
Other conditions
- 2034 - 2141 (108) : Zero target sound event (Subset that does not contain the target sound event)
- 2142 - 2249 (108) : Multiple same-class target sound events (different directions) in one soundscape
- 2250 - 2289 (40) : Real recordings (It was recorded with FOA microphones in indoor and outdoor environments)

2. Data Preparation and Environment Configuration

2.1. Setting

Clone this repository from Github

$ git clone https://github.com/nttcslab/dcase2025_task4_evaluator.git
$ cd dcase2025_task4_evaluator

Clone PEAQ_python and apply patch

$ bash tools/scripts/install_peaq.sh

Install environment

# Using Docker
$ bash docker/build_docker.sh
$ docker run --rm -it --ipc=host \
             --mount type=bind,source=$(pwd),target=/workspace \
             dcase2025t4_evaluator

# Or using conda
$ apt-get install task-spooler          # for tsp
$ conda env create -f environment.yaml
$ conda activate dcase2025t4_evaluator

# Or using pip (python=3.10)
$ apt-get install task-spooler          # for tsp
$ python -m venv dcase2025t4eval
$ source dcase2025t4eval/bin/activate
$ pip install -r requirements.txt

2.2. Soundscapes and Oracle Target Sources Preparation

The pre-mixed soundscapes can be download from Zenodo: DCASE2025Task4EvaluationDataset using a method similar to that in dcase2025_task4_baseline. They can can be placed in the data/eval_set by

$ ln -s "path/to/DCASE2025Task4EvaluationDataset/eval_set/soundscape" \ 
        "path/to/dcase2025_task4_evaluator/data/eval_set"

The oracle target sources can also be downloaded from Zenodo: Ground truth for DCASE2025Task4EvaluationDataset, which can be prepared as

# Download all files from https://zenodo.org/records/16736216 and unzip
$ wget -i ground_truth_zenodo.txt
$ zip -s 0 DCASE2025Task4EvaluationGroundTruth.zip --out ground_truth_full.zip
$ unzip ground_truth_full.zip

# Place the dev_set in dcase2025_task4_evaluator/data/eval_set/oracle_target folder
$ ln -s "path/to/DCASE2025Task4EvaluationGroundTruth/eval_set/oracle_target" \
      "path/to/dcase2025_task4_evaluator/data/eval_set"

Note: If Docker is used, the container may not be able to access symbolic links created with ln command. In this case, the data should be copied directly to the folder instead. To do this, replace ln -s with cp -r in the above commands.

2.3. System Output Data Preparation

To generate the system output, refer to the instructions provided in dcase2025_task4_baseline.

The following command extracts the output of the baseline system, Nguyen_NTT_task4_1_out.zip, into the t4_submissions directory

$ unzip path/to/Nguyen_NTT_task4_1_out.zip \
        -d path/to/dcase2025_task4_evaluator/data/t4_submissions

After extracting the audio, the corresponding metadata YAML file, Nguyen_NTT_task4_1.meta.yaml (see DCASE 2025 Challenge Task 4 webpage) should also be placed in data/t4_submissions/Nguyen_NTT_task4_1_out. However, this step is optional if the evaluation is limited to the waveform output only.

3. Evaluation

To calculate all the scores on the Results page:

# For all systems
$ bash evaluation_all_score.sh

# For specific systems
$ bash evaluation_all_score.sh Nguyen_NTT_task4_1_out <system_name_2> ... <system_name_n>

Note: evaluation_all_score.sh may take ten hours or more to complete.

To calculate only the Systems Ranking scores (only generating system_ranking.csv, see next section):

# For all systems
$ bash evaluation_system_ranking.sh

# For specific systems
$ bash evaluation_system_ranking.sh Nguyen_NTT_task4_1_out <system_name_2> ... <system_name_n>

In both cases, the evaluation can be performed on all systems or on specific systems. For specific systems, the corresponding folder names in data/t4_submissions should be provided as arguments.

Submission meta YAML files are required to extract certain metadata for the results tables (including results on the development test set). However, the evaluation can still be performed without them—the corresponding fields will simply be left empty.

After running the script, the aggregated results will be saved in the results directory.

4. Results

results
  ├─data
  │  └─t4_submission
  │     ├─<system_name>
  │     └─...
  ├─ranking_results
  │  ├─system_ranking.csv
  │  ├─separation_detection_metrics.csv
  │  ├─speech_stoi_pesq.csv
  │  ├─non_speech_peaq.csv
  │  ├─analysis_other_conditions.csv
  │  └─...
  └─ranking_results_latex

results/data/t4_submission/<system_name> contains raw scores for each system, while results/ranking_results summarizes and organizes these scores into CSV files to enable performance comparison across systems. The data shown in the tables on the DCASE 2025 Challenge Task 4 Results page is derived from the CSV files in results/ranking_results, as follows:

system_ranking.csv: provides the data for the Teams Ranking and Systems Ranking
separation_detection_metrics.csv: provides the data for the Detailed Analysis of Separation and Detection Performance
speech_stoi_pesq.csv and non_speech_peaq.csv: provide the data for the Detailed analysis focused on quality of separated speech
analysis_other_conditions.csv: provides the data for the System's performance under partially known conditions and System performance in more challenging conditions

The results/ranking_results_latex directory contains LaTeX table versions of the results/ranking_results CSV files, intended for use in publications or reports.

Citation

If you use this system, please cite the following papers:

Binh Thien Nguyen, Masahiro Yasuda, Daiki Takeuchi, Daisuke Niizumi, Yasunori Ohishi, Noboru Harada, ”Baseline Systems and Evaluation Metrics for Spatial Semantic Segmentation of Sound Scenes,” in arXiv preprint arXiv:2503.22088, 2025, available at URL.
Masahiro Yasuda, Binh Thien Nguyen, Noboru Harada, Romain Serizel, Mayank Mishra, Marc Delcroix, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Yasunori Ohishi, Tomohiro Nakatani, Takao Kawamura, Nobutaka Ono, ”Description and discussion on DCASE 2025 challenge task 4: Spatial Semantic Segmentation of Sound Scenes,” in arXiv preprint arXiv:2506.10676, 2025, available at URL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dcase2025_task4_evaluator

1. Description

Directory structure

DCASE2025 Task4 Evaluation Dataset

2. Data Preparation and Environment Configuration

2.1. Setting

2.2. Soundscapes and Oracle Target Sources Preparation

2.3. System Output Data Preparation

3. Evaluation

4. Results

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
docker		docker
results		results
tools		tools
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml
evaluation_all_score.sh		evaluation_all_score.sh
evaluation_system_ranking.sh		evaluation_system_ranking.sh
ground_truth_zenodo.txt		ground_truth_zenodo.txt
requirements.txt		requirements.txt

nttcslab/dcase2025_task4_evaluator

Folders and files

Latest commit

History

Repository files navigation

dcase2025_task4_evaluator

1. Description

Directory structure

DCASE2025 Task4 Evaluation Dataset

2. Data Preparation and Environment Configuration

2.1. Setting

2.2. Soundscapes and Oracle Target Sources Preparation

2.3. System Output Data Preparation

3. Evaluation

4. Results

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages