Skip to content

fine-tunning the model #61

@Mukthav03

Description

@Mukthav03

Hi, I want to fine‑tune NISQA for my own domain, focusing mainly on MOS prediction (not necessarily all sub‑metrics).
I am considering fine‑tuning only the last layer to adapt the model without full retraining. Could you please clarify the following requirements/recommendations for this scenario?

  1. Minimum dataset size

Approximate number of MOS‑labeled samples required for last‑layer fine‑tuning.
How does this change for full model fine‑tuning?

  1. Audio characteristics

Recommended sampling rate.
Ideal audio length range for training (min/max duration).
Should I normalize audio length or use original duration?

  1. Label requirements

Should MOS labels be per‑utterance or averaged across multiple raters?

  1. Other recommendations
    Whether audio should be mono or stereo.
    Any preprocessing steps NISQA expects before training.

I mainly want to avoid overfitting and ensure that my dataset meets the minimum requirements for stable MOS prediction. Any guidance, even approximate, would be greatly appreciated.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions