Variational Bayesian GAN for Speaker-Recognition

NIST i-vector machine learning challenge

4958 speakers NIST i-vector
We counted the number of samples of each speaker that is less than n in the dataset where n range from 2 to 10. The table 6.3 shows the result. We can find out there is one fifth of speakers with one sample. The dataset is unbalanced.
if the number of sample of each speaker is less than n, we will supplement the number of sample of each speakers to n where n range from 2 to 5

The table 6.4 and Table 6.5 show the evaluation on PLDA with different dimension of latent variable after performing data augmentation.

We find that the improvements of our proposed are batter than the improvements of vanilla GAN after data augmentation. However, there is a strange phenomenon when the added sample increase, the VBGAN can not provide the improvement for PLDA. The reason may be that the augmented data in each speakers are lack of diversity, so that the model is too consistent with some feature of speaker.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
figures		figures
MLP_Layer.py		MLP_Layer.py
README.md		README.md
dataset.py		dataset.py
gan_speaker.py		gan_speaker.py
generate.py		generate.py
latent.py		latent.py
vbgan_speaker.py		vbgan_speaker.py