Skip to content

Issue with training on my own dataset.  #20

@rishabhjain16

Description

@rishabhjain16

I have been trying to train Fragment VC model on my own dataset. It works fine with VCTK Dataset, but when I try it with my own dataset, I get the following error. Maybe it has something to do with my dataset and structure. It a non-native English that I am using as my dataset, so I want to find out if I can do VC from say librispeech to non-native English and vice versa. I get the following error and I am not quite sure how to fix it.

root@06089af1684b:/workspace/vc/FragmentVC# CUDA_VISIBLE_DEVICES=1 python train.py features_myst --s
ave_dir ./ckpts_myst --batch_size 16 --preload
100% 17163/17163 [00:18<00:00, 913.63it/s]
Train:   0% 0/1000 [00:00<?, ? step/s]Traceback (most recent call last):
  File "train.py", line 247, in <module>
    main(**parse_args())
  File "train.py", line 166, in main
    batch = next(train_iterator)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1065, in _next_data
    return self._process_data(data)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
    data.reraise()
  File "/opt/conda/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
    raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataset.py", line 272, in __getitem__
    return self.dataset[self.indices[idx]]
  File "/workspace/vc/FragmentVC/data/intra_speaker_dataset.py", line 73, in __getitem__
    for sampled_id in random.sample(utterance_indices, self.n_samples):
  File "/opt/conda/lib/python3.8/random.py", line 363, in sample
    raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative

Train:   0% 0/1000 [00:01<?, ? step/s]

I think it most probably have something to do with the structure of my dataset. Or something to do with length of audio files? I tried looking around but didn't find any working solutions. Any help is appreciated. Thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions