Tricks to prepare the training dataset

Hello,
I'm very interested in your great work! I have 3 questions, would you mind helping me with them? 
1. I was wondering whether your network could be extended to use on other kinds of audio data, such as music. To this end, I tested it on different instrument datasets. In test 1, I had 33 instruments (like the "speaker" in your case), each containing only about 3 minutes of audio data. In test 2, I had 11 instruments, each containing about 1 hour of audio data. So overall, audio data in test1 is shorter than audio data in test2. However, when I ran the experiments on the same machine, test2 ran ~2-3 times faster than test1. Does the training speed have something to do with the number of "speakers" more than with the whole duration of the training data?
2. How would you recommend the duration for each .wav file. In your dataset, each piece of training data is rather short (~2-3 seconds). Would your network also work for long data such as 1-2 minutes? 
3. For inference, you set the `infer_step` as 8 with a specific `infer_schedule`. Is 8 the best parameter in your experiments? If we want to test different infer_step, how should we set the infer_schedule?

Thank you very much for your help in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tricks to prepare the training dataset #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tricks to prepare the training dataset #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions