The dcase2025_task4_evaluator is a set of scripts for calculating the ranking metric and other informative metrics used to analyze system performance on the evaluation dataset, using the corresponding ground truth, as part of the DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes.
https://dcase.community/challenge2025/task-spatial-semantic-segmentation-of-sound-scenes
dcase2025_task4_evaluator
├─docker
├─data
│ ├─eval_set
│ │ ├─oracle_target
│ │ │ ├─eval_0000_Buzzer.wav
│ │ │ ├─...
│ │ │ └─eval_2249_Speech.wav
│ │ └─soundscape
│ │ ├─ eval_0000.wav
│ │ ├─ ...
│ │ └─ eval_2289.wav
│ └─t4_submissions
│ ├─<system_name_1>
│ │ ├─eval_out
│ │ │ ├─ eval_0000_<estimated_class_name>
│ │ │ ├─ ...
│ │ │ └─ eval_2289_<estimatded_class_name>
│ │ └─<system_name_1>.meta.yaml
│ ├─<system_name_2>
│ : ├─eval_out
│ : │ ├─eval_0000_<estimated_class_name>
│ : │ ├─...
│ : │ └─eval_2289_<estimated_class_name>
│ : └─<system_name_2>.meta.yaml
│ └─<system_name_n>
│ ├─eval_out
│ │ ├─eval_0000_<estimated_class_name>
│ │ ├─...
│ │ └─eval_2289_<estimated_class_name>
│ └─<system_name_n>.meta.yaml
│
├─results
│ ├─data
│ │ └─t4_submission
│ ├─ranking_results
│ └─ranking_results_latex
├─tools
│ ├─scripts
│ │ ├─calc_non_speech_peaq.sh
│ │ ├─calc_speech_pesq_stoi_waveform.sh
│ │ ├─check_audio.sh
│ │ ├─install_peaq.sh
│ │ ├─summarize_scores.sh
│ │ ├─convert_csv_to_latex.sh
│ │ ├─calc_classification_separation_scores.sh
│ │ ├─PEAQ_python.diff
│ │ └─summarize_estimate.sh
│ ├─calc_non_speech_peaq.py
│ ├─calc_speech_pesq_stoi.py
│ ├─check_audio.py
│ ├─summarize_scores.py
│ ├─convert_csv_to_latex.py
│ ├─dataset_s5_waveform.py
│ ├─calc_classification_separation_scores.py
│ ├─plot_common.py
│ ├─read_submission_yaml.py
│ ├─generate_task4_entries_yaml.py
│ ├─summarize_estimate.py
│ └─utils.py
├─README.md
├─ground_truth_zenodo.txt
├─evaluation_system_ranking.sh
└─evaluation_all_score.sh
The data/eval_set/soundscape
directory contains pre-mixed soundscapes released at the evaluation stage of the DCASE 2025 Challenge Task 4, available at Zenodo: DCASE2025Task4EvaluationDataset.
The data/eval_set/oracle_target
directory includes the corresponding oracle target sources, which can be downloaded from Zenodo: Ground truth for DCASE2025Task4EvaluationDataset.
To conduct evaluations, participants should place their system outputs in the data/eval_set/t4_submissions
folder.
For details on the required folder structure and file naming conventions for system outputs, please refer to the DCASE 2025 Challenge Task 4 baseline repository.
The docker
folder contains scripts and configuration for building and running the project's environment.
tools
is the main utility directory containing various scripts for processing, evaluation, and data handling tasks, while tools/scripts
includes shell scripts for automating workflows and installations.
The evaluation can be performed by running evaluation_system_ranking.sh
or evaluation_all_score.sh
from the main directory, with the results saved in the results
folder.
The DCASE 2025 Task 4 Evaluation Dataset consists of 2,290 files, numbered from 0000 to 2289. It includes a main subset, ranging from 0000 to 1619, which is used for system evaluation, while the remaining files are intended for informative analysis. Details are provided below.
- Main Subset
- 0000 - 1619 (1620) : Used to calculate ranking scores in the DCASE2025 Challenge Task 4
- Partially known conditions
- 1620 - 1709 (90) : Known IR (Synthesized using the RIRs included in the train split of the DCASE2025 Task4 Dataset (Development set))
- 1710 - 1817 (108) : Known target sound event (Synthesized using target sound events included in the train split of the DCASE2025 Task4 Dataset (Development set).)
- 1818 - 1925 (108) : Known background noise (Synthesized using backgroun noise included in the train split of the DCASE2025 Task4 Dataset (Development set).)
- 1926 - 2033 (108) : Known interference sound event (Synthesized using interference sound event included in the train split of the DCASE2025 Task4 Dataset (Development set).)
- Other conditions
- 2034 - 2141 (108) : Zero target sound event (Subset that does not contain the target sound event)
- 2142 - 2249 (108) : Multiple same-class target sound events (different directions) in one soundscape
- 2250 - 2289 (40) : Real recordings (It was recorded with FOA microphones in indoor and outdoor environments)
Clone this repository from Github
$ git clone https://github.com/nttcslab/dcase2025_task4_evaluator.git
$ cd dcase2025_task4_evaluator
Clone PEAQ_python and apply patch
$ bash tools/scripts/install_peaq.sh
Install environment
# Using Docker
$ bash docker/build_docker.sh
$ docker run --rm -it --ipc=host \
--mount type=bind,source=$(pwd),target=/workspace \
dcase2025t4_evaluator
# Or using conda
$ apt-get install task-spooler # for tsp
$ conda env create -f environment.yaml
$ conda activate dcase2025t4_evaluator
# Or using pip (python=3.10)
$ apt-get install task-spooler # for tsp
$ python -m venv dcase2025t4eval
$ source dcase2025t4eval/bin/activate
$ pip install -r requirements.txt
The pre-mixed soundscapes can be download from Zenodo: DCASE2025Task4EvaluationDataset using a method similar to that in dcase2025_task4_baseline.
They can can be placed in the data/eval_set
by
$ ln -s "path/to/DCASE2025Task4EvaluationDataset/eval_set/soundscape" \
"path/to/dcase2025_task4_evaluator/data/eval_set"
The oracle target sources can also be downloaded from Zenodo: Ground truth for DCASE2025Task4EvaluationDataset, which can be prepared as
# Download all files from https://zenodo.org/records/16736216 and unzip
$ wget -i ground_truth_zenodo.txt
$ zip -s 0 DCASE2025Task4EvaluationGroundTruth.zip --out ground_truth_full.zip
$ unzip ground_truth_full.zip
# Place the dev_set in dcase2025_task4_evaluator/data/eval_set/oracle_target folder
$ ln -s "path/to/DCASE2025Task4EvaluationGroundTruth/eval_set/oracle_target" \
"path/to/dcase2025_task4_evaluator/data/eval_set"
Note: If Docker is used, the container may not be able to access symbolic links created with ln
command.
In this case, the data should be copied directly to the folder instead.
To do this, replace ln -s
with cp -r
in the above commands.
To generate the system output, refer to the instructions provided in dcase2025_task4_baseline.
The following command extracts the output of the baseline system, Nguyen_NTT_task4_1_out.zip
, into the t4_submissions
directory
$ unzip path/to/Nguyen_NTT_task4_1_out.zip \
-d path/to/dcase2025_task4_evaluator/data/t4_submissions
After extracting the audio, the corresponding metadata YAML file, Nguyen_NTT_task4_1.meta.yaml
(see DCASE 2025 Challenge Task 4 webpage) should also be placed in data/t4_submissions/Nguyen_NTT_task4_1_out
.
However, this step is optional if the evaluation is limited to the waveform output only.
To calculate all the scores on the Results page:
# For all systems
$ bash evaluation_all_score.sh
# For specific systems
$ bash evaluation_all_score.sh Nguyen_NTT_task4_1_out <system_name_2> ... <system_name_n>
Note: evaluation_all_score.sh
may take ten hours or more to complete.
To calculate only the Systems Ranking scores (only generating system_ranking.csv
, see next section):
# For all systems
$ bash evaluation_system_ranking.sh
# For specific systems
$ bash evaluation_system_ranking.sh Nguyen_NTT_task4_1_out <system_name_2> ... <system_name_n>
In both cases, the evaluation can be performed on all systems or on specific systems.
For specific systems, the corresponding folder names in data/t4_submissions
should be provided as arguments.
Submission meta YAML files are required to extract certain metadata for the results tables (including results on the development test set). However, the evaluation can still be performed without them—the corresponding fields will simply be left empty.
After running the script, the aggregated results will be saved in the results
directory.
results
├─data
│ └─t4_submission
│ ├─<system_name>
│ └─...
├─ranking_results
│ ├─system_ranking.csv
│ ├─separation_detection_metrics.csv
│ ├─speech_stoi_pesq.csv
│ ├─non_speech_peaq.csv
│ ├─analysis_other_conditions.csv
│ └─...
└─ranking_results_latex
results/data/t4_submission/<system_name>
contains raw scores for each system, while results/ranking_results
summarizes and organizes these scores into CSV files to enable performance comparison across systems.
The data shown in the tables on the DCASE 2025 Challenge Task 4 Results page is derived from the CSV files in results/ranking_results
, as follows:
system_ranking.csv
: provides the data for the Teams Ranking and Systems Rankingseparation_detection_metrics.csv
: provides the data for the Detailed Analysis of Separation and Detection Performancespeech_stoi_pesq.csv
andnon_speech_peaq.csv
: provide the data for the Detailed analysis focused on quality of separated speechanalysis_other_conditions.csv
: provides the data for the System's performance under partially known conditions and System performance in more challenging conditions
The results/ranking_results_latex
directory contains LaTeX table versions of the results/ranking_results
CSV files, intended for use in publications or reports.
If you use this system, please cite the following papers:
-
Binh Thien Nguyen, Masahiro Yasuda, Daiki Takeuchi, Daisuke Niizumi, Yasunori Ohishi, Noboru Harada, ”Baseline Systems and Evaluation Metrics for Spatial Semantic Segmentation of Sound Scenes,” in arXiv preprint arXiv:2503.22088, 2025, available at URL.
-
Masahiro Yasuda, Binh Thien Nguyen, Noboru Harada, Romain Serizel, Mayank Mishra, Marc Delcroix, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Yasunori Ohishi, Tomohiro Nakatani, Takao Kawamura, Nobutaka Ono, ”Description and discussion on DCASE 2025 challenge task 4: Spatial Semantic Segmentation of Sound Scenes,” in arXiv preprint arXiv:2506.10676, 2025, available at URL.