The dcase2025_task2_evaluator is a script for calculating the AUC, pAUC, precision, recall, and F1 scores from the anomaly score list for the evaluation dataset in DCASE 2025 Challenge Task 2 "First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring."
The dcase2025_task2_evaluator consists of two scripts:
dcase2025_task2_evaluator.py
- This script outputs the AUC and pAUC scores by using:
- Ground truth of the normal and anomaly labels
- Anomaly scores for each wave file listed in the csv file for each machine type, section, and domain
- Detection results for each wave file listed in the csv file for each machine type, section, and domain
- This script outputs the AUC and pAUC scores by using:
03_evaluation_eval_data.sh
- This script execute
dcase2025_task2_evaluator.py
.
- This script execute
Clone this repository from Github.
- Anomaly scores
- Generate csv files
anomaly_score_<machine_type>_section_<section_index>_test.csv
anddecision_result_<machine_type>_section_<section_index>_test.csv
oranomaly_score_DCASE2025T2<machine_type>_section_<section>_test_seed<seed><tag>_Eval.csv
anddecision_result_DCASE2025T2<machine_type>_section_<section>_test_seed<seed><tag>_Eval.csv
by using a system for the evaluation dataset. (The format information is described here.)
- Generate csv files
- Rename the directory containing the csv files to a team name
- Move the directory into
./teams/
- ./dcase2025_task2_evaluator
- /dcase2025_task2_evaluator.py
- /03_evaluation_eval_data.sh
- /ground_truth_attributes
- ground_truth_AutoTrash_section_00_test.csv
- ground_truth_BandSealer_section_00_test.csv
- ...
- /ground_truth_data
- ground_truth_AutoTrash_section_00_test.csv
- ground_truth_BandSealer\section_00_test.csv
- ...
- /ground_truth_domain
- ground_truth_AutoTrash_section_00_test.csv
- ground_truth_BandSealer_section_00_test.csv
- ...
- /teams
- /<team_name_1>
- /<system_name_1>
- anomaly_score_AutoTrash_section_00_test.csv
- anomaly_score_BandSealer_section_00_test.csv
- ...
- decision_result_ToyPet_section_00_test.csv
- decision_result_ToyRCCar_section_00_test.csv
- /<system_name_2>
- anomaly_score_DCASE2025T2AutoTrash_section_00_test_seed<--seed><--tag>_Eval.csv
- anomaly_score_DCASE2025T2BandSealer_section_00_test_seed<--seed><--tag>_Eval.csv
- ...
- decision_result_DCASE2025T2ToyPet_section_00_test_seed<--seed><--tag>_Eval.csv
- decision_result_DCASE2025T2ToyRCCar_section_00_test_seed<--seed><--tag>_Eval.csv
- /<system_name_1>
- /<team_name_2>
- /<system_name_3>
- anomaly_score_AutoTrash_section_00_test.csv
- anomaly_score_BandSealer_section_00\test.csv
- ...
- decision_result_ToyPet_section_00_test.csv
- decision_result_ToyRCCar_section_00_test.csv
- /<system_name_3>
- ...
- /<team_name_1>
- /teams_result
- <system_name_1>_result.csv
- <system_name_2>_result.csv
- <system_name_3>_result.csv
- ...
- /teams_additional_result *
out_all==True
- teams_official_score.csv
- teams_official_score_paper.csv
- teams_section_00_auc.csv
- teams_section_00_score.csv
- /<system_name_1>
- official_score.csv
- <system_name_1>_AutoTrash_section_00_anm_score.png
- ...
- <system_name_1>_ToyRCCar_section_00_anm_score.png
- /<system_name_2>
- official_score.csv
- <system_name_2>_AutoTrash_section_00_anm_score.png
- ...
- <system_name_2>_ToyRCCar_section_00_anm_score.png
- /<system_name_3>
- official_score.csv
- <system_name_3>_AutoTrash_section_00_anm_score.png
- ...
- <system_name_3>_ToyRCCar_section_00_anm_score.png
- ...
- /tools
- plot_anm_score.py
- test_plots.py
- /README.md
The parameters are defined in the script dcase2025_task2_evaluator.py
as follows.
- MAX_FPR
- The FPR threshold for pAUC : default 0.1
- --result_dir
- The output directory : default
./teams_result/
- The output directory : default
- --teams_root_dir
- Directory containing team results. : default
./teams/
- Directory containing team results. : default
- --dir_depth
- What depth to search
--teams_root_dir
using glob. : default2
- If --dir_depth=2, then
glob.glob(<teams_root_dir>/*/*)
- What depth to search
- --tag
- File name tag. : default
_id(0_)
- If using filename is DCASE2025 baseline style, change parameters as necessary.
- File name tag. : default
- --seed
- Seed used during train. : default
13711
- If using filename is DCASE2025 baseline style, change parameters as necessary.
- Seed used during train. : default
- --out_all
- If this parameter is
True
, export supplemental data. : defaultFalse
- If this parameter is
- --additional_result_dir
- The output additional results directory. : default
./teams_additional_result/
- Used when
--out_all==True
.
- The output additional results directory. : default
Run the script dcase2025_task2_evaluator.py
$ python dcase2025_task2_evaluator.py
or
$ bash 03_evaluation_eval_data.sh
The script dcase2025_task2_evaluator.py
calculates the AUC, pAUC, precision, recall, and F1 scores for each machine type, section, and domain and output the calculated scores into the csv files (<system_name_1>_result.csv
, <system_name_2>_result.csv
, ...) in --result_dir (default: ./teams_result/
).
If --out_all=True, each team results are then aggregated into a csv file (teams_official_score.csv
, teams_official_score_paper.csv
) in --additional_result_dir (default: ./teams_additional_result
).
You can check the AUC, pAUC, precision, recall, and F1 scores in the <system_name_N>_result.csv
in --result_dir.
The AUC, pAUC, precision, recall, and F1 scores for each machine type, section, and domain are listed as follows:
<section_name_N>_result.csv
AutoTrash
section,AUC (all),AUC (source),AUC (target),pAUC,precision (source),precision (target),recall (source),recall (target),F1 score (source),F1 score (target)
00,0.5769000000000001,0.8102,0.3436,0.5421052631578948,0.5119047619047619,0.5,0.86,1.0,0.6417910447761195,0.6666666666666666
,,AUC,pAUC,precision,recall,F1 score
arithmetic mean,,0.5769,0.5421052631578948,0.5059523809523809,0.9299999999999999,0.6542288557213931
harmonic mean,,0.48255281677933787,0.5421052631578948,0.5058823529411764,0.9247311827956988,0.6539923954372623
source harmonic mean,,0.8102,0.5421052631578948,0.5119047619047619,0.86,0.6417910447761195
target harmonic mean,,0.3436,0.5421052631578948,0.5,1.0,0.6666666666666666
...
ToyRCCar
section,AUC (all),AUC (source),AUC (target),pAUC,precision (source),precision (target),recall (source),recall (target),F1 score (source),F1 score (target)
00,0.5777999999999999,0.5284,0.6271999999999999,0.5552631578947368,0.6818181818181818,0.4666666666666667,0.6,0.14,0.6382978723404256,0.2153846153846154
,,AUC,pAUC,precision,recall,F1 score
arithmetic mean,,0.5777999999999999,0.5552631578947368,0.5742424242424242,0.37,0.4268412438625205
harmonic mean,,0.5735764624437522,0.5552631578947368,0.554089709762533,0.22702702702702707,0.3220858895705522
source harmonic mean,,0.5284,0.5552631578947368,0.6818181818181818,0.6,0.6382978723404256
target harmonic mean,,0.6271999999999999,0.5552631578947368,0.4666666666666667,0.14,0.2153846153846154
...
,,AUC,pAUC,precision,recall,F1 score
"arithmetic mean over all machine types, sections, and domains",,0.5858625,0.5468421052631579,0.5183191989199928,0.81,0.6104748915566067
"harmonic mean over all machine types, sections, and domains",,0.5437772342298658,0.5452967030441773,0.5150751507085616,0.6207167119350003,0.5629829979642624
"source harmonic mean over all machine types, sections, and domains",,0.6879822239700398,0.5452967030441773,0.5281415194743965,0.6953393434776113,0.6003160139808418
"target harmonic mean over all machine types, sections, and domains",,0.44954916968961445,0.5452967030441773,0.5026397039837188,0.5605585275183607,0.5300215218366398
official score,,0.5442827820713174
official score ci95,,1.271407576916618e-05
Aggregated results for each baseline are listed as follows:
System,metric,h-mean,a-mean,AutoTrash,HomeCamera,ToyPet,ToyRCCar,BandSealer,CoffeeGrinder,Polisher,ScrewFeeder
DCASE2025_baseline_task2_MAHALA,AUC (source),0.719933864244911,0.729725,0.7726000000000001,0.8616,0.6981999999999999,0.5586,0.7638,0.7498,0.7041999999999999,0.729
DCASE2025_baseline_task2_MAHALA,AUC (target),0.4788331261490967,0.508175,0.526,0.42640000000000006,0.509,0.5548,0.3268,0.4042,0.5278,0.7904000000000001
DCASE2025_baseline_task2_MAHALA,"pAUC (source, target)",0.5459161077739156,0.5515131578947368,0.541578947368421,0.5184210526315789,0.5684210526315789,0.54,0.49105263157894735,0.5142105263157895,0.5378947368421052,0.7005263157894737
DCASE2025_baseline_task2_MAHALA,TOTAL score,0.5650558189601554,0.596471052631579,,,,,,,,
DCASE2025_baseline_task2_MSE,AUC (source),0.6879822239700398,0.6996249999999999,0.8102,0.8140000000000001,0.677,0.5284,0.7198,0.7303999999999999,0.6686000000000001,0.6486
DCASE2025_baseline_task2_MSE,AUC (target),0.44954916968961445,0.4721,0.3436,0.4976,0.36699999999999994,0.6271999999999999,0.3956,0.4436,0.443,0.6592
DCASE2025_baseline_task2_MSE,"pAUC (source, target)",0.5452967030441773,0.5468421052631579,0.5421052631578948,0.5284210526315789,0.55,0.5552631578947368,0.5205263157894737,0.5342105263157895,0.5231578947368422,0.6210526315789474
DCASE2025_baseline_task2_MSE,TOTAL score,0.5442827820713174,0.5728557017543859,,,,,,,,
If you use this system, please cite all the following four papers:
- Tomoya Nishida, Noboru Harada, Daisuke Niizumi, Davide Albertini, Roberto Sannino, Simone Pradolini, Filippo Augusti, Keisuke Imoto, Kota Dohi, Harsh Purohit, Takashi Endo, and Yohei Kawaguchi. Description and discussion on DCASE 2025 challenge task 2: first-shot unsupervised anomalous sound detection for machine condition monitoring. In arXiv e-prints: 2506.10097, 2025. URL
- Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, and Shoichiro Saito. ToyADMOS2: another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), 1–5. Barcelona, Spain, November 2021. URL
- Kota Dohi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Masaaki Yamamoto, Yuki Nikaido, and Yohei Kawaguchi. MIMII DG: sound dataset for malfunctioning industrial machine investigation and inspection for domain generalization task. In Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022). Nancy, France, November 2022. URL
- Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, and Masahiro Yasuda. First-shot anomaly detection for machine condition monitoring: a domain generalization baseline. Proceedings of 31st European Signal Processing Conference (EUSIPCO), pages 191–195, 2023. URL