Skip to content

Tricks to prepare the training dataset #5

@adagio715

Description

@adagio715

Hello,
I'm very interested in your great work! I have 3 questions, would you mind helping me with them?

  1. I was wondering whether your network could be extended to use on other kinds of audio data, such as music. To this end, I tested it on different instrument datasets. In test 1, I had 33 instruments (like the "speaker" in your case), each containing only about 3 minutes of audio data. In test 2, I had 11 instruments, each containing about 1 hour of audio data. So overall, audio data in test1 is shorter than audio data in test2. However, when I ran the experiments on the same machine, test2 ran ~2-3 times faster than test1. Does the training speed have something to do with the number of "speakers" more than with the whole duration of the training data?
  2. How would you recommend the duration for each .wav file. In your dataset, each piece of training data is rather short (~2-3 seconds). Would your network also work for long data such as 1-2 minutes?
  3. For inference, you set the infer_step as 8 with a specific infer_schedule. Is 8 the best parameter in your experiments? If we want to test different infer_step, how should we set the infer_schedule?

Thank you very much for your help in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions