-
Notifications
You must be signed in to change notification settings - Fork 138
Description
Hi, I want to fine‑tune NISQA for my own domain, focusing mainly on MOS prediction (not necessarily all sub‑metrics).
I am considering fine‑tuning only the last layer to adapt the model without full retraining. Could you please clarify the following requirements/recommendations for this scenario?
- Minimum dataset size
Approximate number of MOS‑labeled samples required for last‑layer fine‑tuning.
How does this change for full model fine‑tuning?
- Audio characteristics
Recommended sampling rate.
Ideal audio length range for training (min/max duration).
Should I normalize audio length or use original duration?
- Label requirements
Should MOS labels be per‑utterance or averaged across multiple raters?
- Other recommendations
Whether audio should be mono or stereo.
Any preprocessing steps NISQA expects before training.
I mainly want to avoid overfitting and ensure that my dataset meets the minimum requirements for stable MOS prediction. Any guidance, even approximate, would be greatly appreciated.
Thank you!