fine-tunning the model

Hi, I want to fine‑tune NISQA for my own domain, focusing mainly on MOS prediction (not necessarily all sub‑metrics).
I am considering fine‑tuning only the last layer to adapt the model without full retraining. Could you please clarify the following requirements/recommendations for this scenario?

1) Minimum dataset size

Approximate number of MOS‑labeled samples required for last‑layer fine‑tuning.
How does this change for full model fine‑tuning?

2) Audio characteristics

Recommended sampling rate. 
Ideal audio length range for training (min/max duration).
Should I normalize audio length or use original duration?

3) Label requirements

Should MOS labels be per‑utterance or averaged across multiple raters?

4) Other recommendations
Whether audio should be mono or stereo.
Any preprocessing steps NISQA expects before training.

I mainly want to avoid overfitting and ensure that my dataset meets the minimum requirements for stable MOS prediction. Any guidance, even approximate, would be greatly appreciated.

Thank you!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fine-tunning the model #61

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

fine-tunning the model #61

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions