dcase2025_task2_evaluator

The dcase2025_task2_evaluator is a script for calculating the AUC, pAUC, precision, recall, and F1 scores from the anomaly score list for the evaluation dataset in DCASE 2025 Challenge Task 2 "First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring."

https://dcase.community/challenge2025/task-first-shot-unsupervised-anomalous-sound-detection-for-machine-condition-monitoring

Description

The dcase2025_task2_evaluator consists of two scripts:

dcase2025_task2_evaluator.py
- This script outputs the AUC and pAUC scores by using:
  - Ground truth of the normal and anomaly labels
  - Anomaly scores for each wave file listed in the csv file for each machine type, section, and domain
  - Detection results for each wave file listed in the csv file for each machine type, section, and domain
03_evaluation_eval_data.sh
- This script execute dcase2025_task2_evaluator.py.

Usage

1. Clone repository

Clone this repository from Github.

2. Prepare data

Anomaly scores
- Generate csv files anomaly_score_<machine_type>_section_<section_index>_test.csv and decision_result_<machine_type>_section_<section_index>_test.csv or anomaly_score_DCASE2025T2<machine_type>_section_<section>_test_seed<seed><tag>_Eval.csv and decision_result_DCASE2025T2<machine_type>_section_<section>_test_seed<seed><tag>_Eval.csv by using a system for the evaluation dataset. (The format information is described here.)
Rename the directory containing the csv files to a team name
Move the directory into ./teams/

3. Check directory structure

./dcase2025_task2_evaluator
- /dcase2025_task2_evaluator.py
- /03_evaluation_eval_data.sh
- /ground_truth_attributes
  - ground_truth_AutoTrash_section_00_test.csv
  - ground_truth_BandSealer_section_00_test.csv
  - ...
- /ground_truth_data
  - ground_truth_AutoTrash_section_00_test.csv
  - ground_truth_BandSealer\section_00_test.csv
  - ...
- /ground_truth_domain
  - ground_truth_AutoTrash_section_00_test.csv
  - ground_truth_BandSealer_section_00_test.csv
  - ...
- /teams
  - /<team_name_1>
    - /<system_name_1>
      - anomaly_score_AutoTrash_section_00_test.csv
      - anomaly_score_BandSealer_section_00_test.csv
      - ...
      - decision_result_ToyPet_section_00_test.csv
      - decision_result_ToyRCCar_section_00_test.csv
    - /<system_name_2>
      - anomaly_score_DCASE2025T2AutoTrash_section_00_test_seed<--seed><--tag>_Eval.csv
      - anomaly_score_DCASE2025T2BandSealer_section_00_test_seed<--seed><--tag>_Eval.csv
      - ...
      - decision_result_DCASE2025T2ToyPet_section_00_test_seed<--seed><--tag>_Eval.csv
      - decision_result_DCASE2025T2ToyRCCar_section_00_test_seed<--seed><--tag>_Eval.csv
  - /<team_name_2>
    - /<system_name_3>
      - anomaly_score_AutoTrash_section_00_test.csv
      - anomaly_score_BandSealer_section_00\test.csv
      - ...
      - decision_result_ToyPet_section_00_test.csv
      - decision_result_ToyRCCar_section_00_test.csv
  - ...
- /teams_result
  - <system_name_1>_result.csv
  - <system_name_2>_result.csv
  - <system_name_3>_result.csv
  - ...
- /teams_additional_result *out_all==True
  - teams_official_score.csv
  - teams_official_score_paper.csv
  - teams_section_00_auc.csv
  - teams_section_00_score.csv
  - /<system_name_1>
    - official_score.csv
    - <system_name_1>_AutoTrash_section_00_anm_score.png
    - ...
    - <system_name_1>_ToyRCCar_section_00_anm_score.png
  - /<system_name_2>
    - official_score.csv
    - <system_name_2>_AutoTrash_section_00_anm_score.png
    - ...
    - <system_name_2>_ToyRCCar_section_00_anm_score.png
  - /<system_name_3>
    - official_score.csv
    - <system_name_3>_AutoTrash_section_00_anm_score.png
    - ...
    - <system_name_3>_ToyRCCar_section_00_anm_score.png
  - ...
- /tools
  - plot_anm_score.py
  - test_plots.py
- /README.md

4. Change parameters

The parameters are defined in the script dcase2025_task2_evaluator.py as follows.

MAX_FPR
- The FPR threshold for pAUC : default 0.1
--result_dir
- The output directory : default ./teams_result/
--teams_root_dir
- Directory containing team results. : default ./teams/
--dir_depth
- What depth to search --teams_root_dir using glob. : default 2
- If --dir_depth=2, then glob.glob(<teams_root_dir>/*/*)
--tag
- File name tag. : default _id(0_)
- If using filename is DCASE2025 baseline style, change parameters as necessary.
--seed
- Seed used during train. : default 13711
- If using filename is DCASE2025 baseline style, change parameters as necessary.
--out_all
- If this parameter is True, export supplemental data. : default False
--additional_result_dir
- The output additional results directory. : default ./teams_additional_result/
- Used when --out_all==True.

5. Run script

Run the script dcase2025_task2_evaluator.py

$ python dcase2025_task2_evaluator.py

or

$ bash 03_evaluation_eval_data.sh

The script dcase2025_task2_evaluator.py calculates the AUC, pAUC, precision, recall, and F1 scores for each machine type, section, and domain and output the calculated scores into the csv files (<system_name_1>_result.csv, <system_name_2>_result.csv, ...) in --result_dir (default: ./teams_result/). If --out_all=True, each team results are then aggregated into a csv file (teams_official_score.csv, teams_official_score_paper.csv) in --additional_result_dir (default: ./teams_additional_result).

6. Check results

You can check the AUC, pAUC, precision, recall, and F1 scores in the <system_name_N>_result.csv in --result_dir. The AUC, pAUC, precision, recall, and F1 scores for each machine type, section, and domain are listed as follows:

<section_name_N>_result.csv

AutoTrash
section,AUC (all),AUC (source),AUC (target),pAUC,precision (source),precision (target),recall (source),recall (target),F1 score (source),F1 score (target)
00,0.5769000000000001,0.8102,0.3436,0.5421052631578948,0.5119047619047619,0.5,0.86,1.0,0.6417910447761195,0.6666666666666666
,,AUC,pAUC,precision,recall,F1 score
arithmetic mean,,0.5769,0.5421052631578948,0.5059523809523809,0.9299999999999999,0.6542288557213931
harmonic mean,,0.48255281677933787,0.5421052631578948,0.5058823529411764,0.9247311827956988,0.6539923954372623
source harmonic mean,,0.8102,0.5421052631578948,0.5119047619047619,0.86,0.6417910447761195
target harmonic mean,,0.3436,0.5421052631578948,0.5,1.0,0.6666666666666666

...

ToyRCCar
section,AUC (all),AUC (source),AUC (target),pAUC,precision (source),precision (target),recall (source),recall (target),F1 score (source),F1 score (target)
00,0.5777999999999999,0.5284,0.6271999999999999,0.5552631578947368,0.6818181818181818,0.4666666666666667,0.6,0.14,0.6382978723404256,0.2153846153846154
,,AUC,pAUC,precision,recall,F1 score
arithmetic mean,,0.5777999999999999,0.5552631578947368,0.5742424242424242,0.37,0.4268412438625205
harmonic mean,,0.5735764624437522,0.5552631578947368,0.554089709762533,0.22702702702702707,0.3220858895705522
source harmonic mean,,0.5284,0.5552631578947368,0.6818181818181818,0.6,0.6382978723404256
target harmonic mean,,0.6271999999999999,0.5552631578947368,0.4666666666666667,0.14,0.2153846153846154

...

,,AUC,pAUC,precision,recall,F1 score
"arithmetic mean over all machine types, sections, and domains",,0.5858625,0.5468421052631579,0.5183191989199928,0.81,0.6104748915566067
"harmonic mean over all machine types, sections, and domains",,0.5437772342298658,0.5452967030441773,0.5150751507085616,0.6207167119350003,0.5629829979642624
"source harmonic mean over all machine types, sections, and domains",,0.6879822239700398,0.5452967030441773,0.5281415194743965,0.6953393434776113,0.6003160139808418
"target harmonic mean over all machine types, sections, and domains",,0.44954916968961445,0.5452967030441773,0.5026397039837188,0.5605585275183607,0.5300215218366398

official score,,0.5442827820713174
official score ci95,,1.271407576916618e-05

Aggregated results for each baseline are listed as follows:

System,metric,h-mean,a-mean,AutoTrash,HomeCamera,ToyPet,ToyRCCar,BandSealer,CoffeeGrinder,Polisher,ScrewFeeder
DCASE2025_baseline_task2_MAHALA,AUC (source),0.719933864244911,0.729725,0.7726000000000001,0.8616,0.6981999999999999,0.5586,0.7638,0.7498,0.7041999999999999,0.729
DCASE2025_baseline_task2_MAHALA,AUC (target),0.4788331261490967,0.508175,0.526,0.42640000000000006,0.509,0.5548,0.3268,0.4042,0.5278,0.7904000000000001
DCASE2025_baseline_task2_MAHALA,"pAUC (source, target)",0.5459161077739156,0.5515131578947368,0.541578947368421,0.5184210526315789,0.5684210526315789,0.54,0.49105263157894735,0.5142105263157895,0.5378947368421052,0.7005263157894737
DCASE2025_baseline_task2_MAHALA,TOTAL score,0.5650558189601554,0.596471052631579,,,,,,,,
DCASE2025_baseline_task2_MSE,AUC (source),0.6879822239700398,0.6996249999999999,0.8102,0.8140000000000001,0.677,0.5284,0.7198,0.7303999999999999,0.6686000000000001,0.6486
DCASE2025_baseline_task2_MSE,AUC (target),0.44954916968961445,0.4721,0.3436,0.4976,0.36699999999999994,0.6271999999999999,0.3956,0.4436,0.443,0.6592
DCASE2025_baseline_task2_MSE,"pAUC (source, target)",0.5452967030441773,0.5468421052631579,0.5421052631578948,0.5284210526315789,0.55,0.5552631578947368,0.5205263157894737,0.5342105263157895,0.5231578947368422,0.6210526315789474
DCASE2025_baseline_task2_MSE,TOTAL score,0.5442827820713174,0.5728557017543859,,,,,,,,

Citation

If you use this system, please cite all the following four papers:

Tomoya Nishida, Noboru Harada, Daisuke Niizumi, Davide Albertini, Roberto Sannino, Simone Pradolini, Filippo Augusti, Keisuke Imoto, Kota Dohi, Harsh Purohit, Takashi Endo, and Yohei Kawaguchi. Description and discussion on DCASE 2025 challenge task 2: first-shot unsupervised anomalous sound detection for machine condition monitoring. In arXiv e-prints: 2506.10097, 2025. URL
Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, and Shoichiro Saito. ToyADMOS2: another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), 1–5. Barcelona, Spain, November 2021. URL
Kota Dohi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Masaaki Yamamoto, Yuki Nikaido, and Yohei Kawaguchi. MIMII DG: sound dataset for malfunctioning industrial machine investigation and inspection for domain generalization task. In Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022). Nancy, France, November 2022. URL
Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, and Masahiro Yasuda. First-shot anomaly detection for machine condition monitoring: a domain generalization baseline. Proceedings of 31st European Signal Processing Conference (EUSIPCO), pages 191–195, 2023. URL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dcase2025_task2_evaluator

Description

Usage

1. Clone repository

2. Prepare data

3. Check directory structure

4. Change parameters

5. Run script

6. Check results

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ground_truth_attributes		ground_truth_attributes
ground_truth_attributes_all		ground_truth_attributes_all
ground_truth_data		ground_truth_data
ground_truth_domain		ground_truth_domain
teams		teams
teams_additional_result		teams_additional_result
teams_result		teams_result
tools		tools
.gitignore		.gitignore
03_evaluation_eval_data.sh		03_evaluation_eval_data.sh
LISENCEv2.1.pdf		LISENCEv2.1.pdf
README.md		README.md
dcase2025_task2_evaluator.py		dcase2025_task2_evaluator.py
dockerfile		dockerfile
requirements.txt		requirements.txt

nttcslab/dcase2025_task2_evaluator

Folders and files

Latest commit

History

Repository files navigation

dcase2025_task2_evaluator

Description

Usage

1. Clone repository

2. Prepare data

3. Check directory structure

4. Change parameters

5. Run script

6. Check results

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages