what the data ratio was in training canary-1b-flash #12996
-
I'm interested in how the dataset was split during the training of the canary-1b-flash model. What percentage of data was allocated to the training set, what percentage to the validation set, and what percentage to the test set? Was it a standard ratio like 80/10/10 or was something different used? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @sh1man999 Thank you for your interest! Please refer to the Canary paper for a detailed discussion of training and test data. For monitoring performance during training we use LibriSpeech and MCV for multilingual ASR and MUSTC and Europarl for AST. |
Beta Was this translation helpful? Give feedback.
Hi @sh1man999
Thank you for your interest!
Please refer to the Canary paper for a detailed discussion of training and test data. For monitoring performance during training we use LibriSpeech and MCV for multilingual ASR and MUSTC and Europarl for AST.