LowResDys

Requirements

Clone this repo and install dependencies:

git clone https://github.com/sqrk/LowResDys.git
cd LowResDys
pip install -r requirements.txt

Scripts

Finetuning

Whisper

Go to whisper/finetune.py
Change the variables at the beginning of the file if needed, e.g.

dataset_name = 'COPAS' #['COPAS', 'easycall', 'torgo', 'uaspeech', 'All', 'All_balanced']
model_name = 'openai/whisper-large-v3'
output_dir = f"./{dataset_name}-whisper-lg-3"
language = 'dutch' #['dutch', 'english', 'italian']
cache_dir = '/l/users/karima.kadaoui/.cache/huggingface'

Run python ./whisper/finetune.py

MMS

Go to mms/finetune.py.
Change the variables at the beginning of the file if needed, e.g.

dataset_name = 'COPAS' #[torgo, uaspeech, easycall, COPAS, All, All_balanced]
model_name = 'facebook/mms-1b-all'
output_dir = f"./{dataset_name}-mms1ball"
language = 'ita' #[ita, nld, eng]
cache_dir = '/l/users/karima.kadaoui/.cache/huggingface'

Run python ./mms/finetune.py

Inference

Whisper

Go to whisper/inference.py
Change the variables at the beginning of the file if needed, e.g.

setting = 'zshot' #[FT, zshot, FTMulti]
dataset_name = 'COPAS' #[torgo, uaspeech, easycall, COPAS, All, All_balanced]
model_name = 'openai/whisper-large-v3'
split = 'test'
language = 'dutch' #['english', 'italian', 'dutch']
cache_dir = '/l/users/karima.kadaoui/.cache/huggingface'

Run python ./whisper/inference.py

MMS

Go to mms/inference.py.
Change the variables at the beginning of the file if needed, e.g.

model_name = 'facebook/mms-1b-all'
language = 'eng' #[eng, ita, nld]
dataset_name = "COPAS" #[torgo, uaspeech, easycall, COPAS, All, All_balanced]
setting = 'zshot' #[zshot, FT, FTMulti]
model_name = f'sqrk/{dataset_name}-mms1ball'
split = 'test'

Run python ./mms/inference.py

Data

The datasets are all uploaded to the following Hugginface dataset https://huggingface.co/datasets/sqrk/dys_mixture You can download a specific dataset using

from datasets import load_dataset

dataset = load_dataset("sqrk/dys_mixture", <dataset_name>)

where dataset_name can be any of ['COPAS', 'torgo', 'uaspeech', 'easycall', 'All', 'All_balanced']

The data does not need to be downloaded before running the finetuning/inference scripts. The scripts take care of that.

Checkpoints

Dataset	Whisper	MMS
COPAS	sqrk/COPAS-whisper-lg-3-Nov29	sqrk/COPAS-mms1ball-Nov30
EasyCall	sqrk/easycall-whisper-lg-3-Nov29	sqrk/easycall-mms1ball-Nov30
TORGO	sqrk/torgo-whisper-lg-3-Nov29	sqrk/torgo-mms1ball-Nov30
UASpeech	sqrk/uaspeech-whisper-lg-3-Nov29	sqrk/uaspeech-mms1ball-Nov30
Multi	sqrk/All-lang_tag-whisper-lg-3-Nov30	sqrk/All-mms1ball-Dec1
Multi_B	sqrk/All_balanced-lang_tag-whisper-lg-3-Nov30	sqrk/All_balanced-mms1ball-Dec1

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
mms		mms
whisper		whisper
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LowResDys

Requirements

Scripts

Finetuning

Inference

Data

Checkpoints

About

Uh oh!

Releases

Packages

Uh oh!

Languages

sqrk/LowResDys

Folders and files

Latest commit

History

Repository files navigation

LowResDys

Requirements

Scripts

Finetuning

Inference

Data

Checkpoints

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages