Diletta Goglia,
Davide Vega,
Ece Calikus,
diletta.goglia@it.uu.se (D.G.), davide.vega@it.uu.se (D.V.), ece.calikus@it.uu.se (E.C.)
UU-InfoLab, Department of Information Technology, Uppsala University, Uppsala, Sweden
Our model will be released soon! Stay in touch for updates.
This repository is actively maintained. For any questions or further information, please feel free to contact the corresponding author:
Diletta Goglia
Ph.D. Candidate at Uppsala University Information Laboratory (UU-InfoLab) research group.
Information Technology Department, Uppsala University, Sweden.
diletta.goglia@it.uu.se
@dilettagoglia.bsky.social
Last update: July 2025
.
├── data/
│ ├── training_set.py
│ ├── test_set.py
│ └── youtube/ # 'Supernanny' TV show episodes
│ ├── data_raw/ # audio files of each episode (subdirs correspond to seasons)
│ ├── data_clean/ # diarization (transcripts + VAD)
│ └── data_analysis/ #
├── models/ #
├── img/
├── annotations/
│ ├── to_annotate/ # csv files, automatically generated
│ ├── llama3/ # one csv per experiment
│ ├── gpt4/ #
│ └── human-annotated/ # one csv file per annotator
├── src/
│ ├── __init__.py
│ ├── utilities.py
│ ├── youtube.py
│ ├── diarization.py
│ ├── generate_annot_files.py
│ ├── annot_agreement.py
│ ├── human_mach_comparison.py
│ ├── pseudolabeling.py
│ └── test.py
├── requirements.txt
└── README.md