-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
Hello,
First of all, thank you so much for sharing this model!
I have a few questions regarding the training and usage details:
- Training configuration – Could you please share some information about the training setup (GPU type, total training time, batch size, number of epochs, etc.)?
- Audio format – What sampling rate was used for the training audio? (e.g., 16 kHz, 24 kHz, or 44.1 kHz?)
- Inference performance – What is the current inference speed on your GPU? Have you tried using frameworks such as vLLM or others to optimize runtime performance?
- Multi-speaker training – Have you experimented with multi-speaker training yet?
- Dataset format – Would it be possible to release the dataset (or a subset) in a format like Parquet, containing fields such as
audio | transcript | sampling_rate
?
Thanks again for your contribution!
Best regards,
Metadata
Metadata
Assignees
Labels
No labels