-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Hello there,
Thanks for your efforts in open-sourcing the code, it's vital for us trying to reproduce the result presented in the paper.
Problem
But I've come across a RuntimeError
when adapting the model with our private data which shows:
/*/EEND-vector-clustering/eend/pytorch_backend/train.py:186: RuntimeWarning: invalid value encountered in true_divide
fet_arr[spk] = org / norm
...
Traceback (most recent call last):
...
RuntimeError: The loss (nan) is not finite.
Detail
After some debugging, I found the problem actually happens during the backpropagation step when there exists an entry left with zeros in the embedding layer:
EEND-vector-clustering/eend/pytorch_backend/train.py
Lines 173 to 186 in b3649ee
fet_arr = np.zeros([spk_num, fet_dim]) | |
# sum | |
bs = spklabs.shape[0] | |
for i in range(bs): | |
if spkidx_tbl[spklabs[i]] == -1: | |
raise ValueError(spklabs[i]) | |
fet_arr[spkidx_tbl[spklabs[i]]] += spkvecs[i] | |
# normalize | |
for spk in range(spk_num): | |
org = fet_arr[spk] | |
norm = np.linalg.norm(org, ord=2) | |
fet_arr[spk] = org / norm |
Since the embeddings are actually loaded from the dumped speaker embeddings generated by the save_spkv_lab.py
script when adapting the model, I suspect there might exist some issue in the save_spkv_lab
function.
After some careful step-by-step checking with pdb, I found there is actually some silent speaker label added in the all_labels
variable when dumping the speaker embeddings:
EEND-vector-clustering/eend/pytorch_backend/infer.py
Lines 349 to 355 in b3649ee
for i in range(args.num_speakers): | |
# Exclude samples corresponding to silent speaker | |
if torch.sum(t_chunked_t[sigma[i]]) > 0: | |
vec = outputs[i+1][0].cpu().detach().numpy() | |
lab = chunk_data[2][sigma[i]] | |
all_outputs.append(vec) | |
all_labels.append(lab) |
Even when if torch.sum(t_chunked_t[sigma[i]]) > 0
, lab
can still be -1
which is considered as silent speaker acroding to code in:
EEND-vector-clustering/eend/pytorch_backend/diarization_dataset.py
Lines 94 to 99 in b3649ee
S_arr = -1 * np.ones(n_speakers).astype(np.int64) | |
for seg in filtered_segments: | |
speaker_index = speakers.index(self.data.utt2spk[seg['utt']]) | |
all_speaker_index = self.all_speakers.index( | |
self.data.utt2spk[seg['utt']]) | |
S_arr[speaker_index] = all_speaker_index |
Since these silent speaker labels are -1 and the python list support negative indexing, this issue is silently ignored when dumping the embedding but will cause Exceptions when training begins.
Question
I could simply fix this issue by adding speaker label to all_labels
only if lab < 0
when saving speaker embeddings and the followed training process could continue smoothly resulting in a good performing model.
But before opening any PR, I would like to know if you guys have ever come across such an issue or do you have any idea on why this will happen.
Thanks!