This repository provides all the code, data, and supplementary materials referenced in the INTER-NOISE 2025 article:
Room Acoustics and Microphone Characteristics Show Systematic Impact on Sound Event Recognition
Gabriel Bibbó, Craig Cieciura, Mark D. Plumbley
Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, United Kingdom
This repository is a companion to the above article. It contains:
- Audio generation code (audio_generation.py): Scripts to generate the standardized 60-minute audio file from AudioSet, including pre-processing (normalization, compression, mixing of classes, concatenation).
- Annotations CSV (annotations.csv): A CSV file containing the YouTube video IDs, timestamps, and class labels used to construct the experimental audio segments.
- Experimental results CSV (results.csv): A CSV file with all frame-level results for every experimental configuration (room, microphone, class, etc).
- Analysis result images:
- Results from the first 30 minutes (single-class segments)
- Results from the last 30 minutes (overlapping-class segments, focusing on one class of each pair)
- Results from the last 30 minutes (overlapping-class segments, focusing on the complementary class)
These resources enable full transparency and reproducibility of the results discussed in the article.
room_acoustics_SED/
├── audio_generation.py
├── annotations.csv
├── results.csv
├── results_single_class.png
├── results_overlap_classA.png
├── results_overlap_classB.png
├── README.md
└── IN2025_1070751.pdf
audio_generation.py
: Scripts and notebooks to generate the audio and process annotations.annotations.csv
: Metadata for each audio segment (YouTube ID, timestamp, class label).results.csv
: Frame-level metrics and summary statistics from all experimental runs.results_single_class.png
: Analysis of the first 30 minutes (single classes).results_overlap_classA.png
: Analysis of the last 30 minutes — focus on one class in overlapping pairs.results_overlap_classB.png
: Analysis of the last 30 minutes — focus on the complementary class.IN2025_1070751.pdf
: The full article as submitted to INTER-NOISE 2025.
-
Clone the repository.
git clone https://github.com/gbibbo/room_acoustics_SED.git cd room_acoustics_SED
-
Review the audio generation code
- All scripts for generating the experimental audio and processing YouTube metadata are in
audio_generation/
. - The main script is
audio_generation.py
. See comments for usage instructions.
- All scripts for generating the experimental audio and processing YouTube metadata are in
-
Explore the data
annotations.csv
: Lists every used YouTube video, the time intervals, and the mapped class label.results.csv
: Contains all model outputs, including frame-level occurrence, mean probability, and confidence scores for each configuration.
-
View analysis results
- Figures as shown in the article:
results_single_class.png
: Performance for each class, room, and microphone (first 30 mins).results_overlap_classA.png
: Overlapping classes, impact on primary class (last 30 mins).results_overlap_classB.png
: Overlapping classes, impact on complementary class.
- Figures as shown in the article:
- The audio file was generated from AudioSet segments, grouped into 15 daily household sound classes.
- Segments were normalized, compressed, and concatenated to form:
- 30 minutes of single-class audio (2 minutes per class)
- 30 minutes of overlapping-class audio (15 unique class pairs, 2 minutes per pair)
- See the scripts in
audio_generation/
for all processing details.
- Each 1-second segment is tracked in
annotations.csv
with:- YouTube video ID
- Start/end timestamps
- Assigned class(es)
- Source information for traceability
results.csv
contains:- Room, microphone, class, and overlap configuration
- Frame-level detection occurrence (%)
- Mean probability/confidence assigned to the correct class
- SNR measurements for each configuration
- Figures summarize the impact of room acoustics, microphone, and overlapping events, as described in the article:
results_single_class.png
: Classes in isolationresults_overlap_classA.png
: Overlaps, focus on primary classresults_overlap_classB.png
: Overlaps, focus on secondary class
If you use this repository or data, please cite:
Bibbó, G., Cieciura, C., & Plumbley, M. D. (2025). Room Acoustics and Microphone Characteristics Show Systematic Impact on Sound Event Recognition. INTER-NOISE 2025.
Link to Article
All code and data are provided for academic/research use under a Creative Commons Attribution (CC BY) license.
See LICENSE
file for details.
For questions about the code or data, please contact Gabriel Bibbó: g.bibbo@surrey.ac.uk