-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
Hello,
I'm very interested in your great work! I have 3 questions, would you mind helping me with them?
- I was wondering whether your network could be extended to use on other kinds of audio data, such as music. To this end, I tested it on different instrument datasets. In test 1, I had 33 instruments (like the "speaker" in your case), each containing only about 3 minutes of audio data. In test 2, I had 11 instruments, each containing about 1 hour of audio data. So overall, audio data in test1 is shorter than audio data in test2. However, when I ran the experiments on the same machine, test2 ran ~2-3 times faster than test1. Does the training speed have something to do with the number of "speakers" more than with the whole duration of the training data?
- How would you recommend the duration for each .wav file. In your dataset, each piece of training data is rather short (~2-3 seconds). Would your network also work for long data such as 1-2 minutes?
- For inference, you set the
infer_step
as 8 with a specificinfer_schedule
. Is 8 the best parameter in your experiments? If we want to test different infer_step, how should we set the infer_schedule?
Thank you very much for your help in advance!
Metadata
Metadata
Assignees
Labels
No labels