what the data ratio was in training canary-1b-flash #12996

sh1man999 · 2025-04-13T02:51:00Z

sh1man999
Apr 13, 2025

I'm interested in how the dataset was split during the training of the canary-1b-flash model. What percentage of data was allocated to the training set, what percentage to the validation set, and what percentage to the test set? Was it a standard ratio like 80/10/10 or was something different used?

Answered by ankitapasad

May 2, 2025

Hi @sh1man999

Thank you for your interest!

Please refer to the Canary paper for a detailed discussion of training and test data. For monitoring performance during training we use LibriSpeech and MCV for multilingual ASR and MUSTC and Europarl for AST.

View full answer

ankitapasad · 2025-05-02T00:21:04Z

ankitapasad
May 2, 2025
Collaborator

Hi @sh1man999

Thank you for your interest!

Please refer to the Canary paper for a detailed discussion of training and test data. For monitoring performance during training we use LibriSpeech and MCV for multilingual ASR and MUSTC and Europarl for AST.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

what the data ratio was in training canary-1b-flash #12996

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

what the data ratio was in training canary-1b-flash #12996

Uh oh!

sh1man999 Apr 13, 2025

Replies: 1 comment

Uh oh!

ankitapasad May 2, 2025 Collaborator

sh1man999
Apr 13, 2025

ankitapasad
May 2, 2025
Collaborator