|
| 1 | +# Resources |
| 2 | + |
| 3 | +--- |
| 4 | + |
| 5 | +## Paper |
| 6 | + |
| 7 | +- [\[2110.13900\] WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) |
| 8 | +- [\[1904.08104\] RawNet: Advanced end-to-end deep neural network for speaker verification](https://arxiv.org/abs/1904.08104) |
| 9 | +- [Improved RawNet with Feature Map Scaling for Text-Independent Speaker Verification Using Raw Waveforms (ISCA Interspeech 2020)](https://www.isca-archive.org/interspeech_2020/jung20c_interspeech.html) |
| 10 | +- [\[2203.08488\] Pushing the limits of raw waveform speaker recognition](https://arxiv.org/abs/2203.08488) |
| 11 | +- [\[2406.07103\] MR-RawNet: multiple temporal resolutions for variable duration utterances](https://arxiv.org/abs/2406.07103) |
| 12 | +- [\[2011.01108\] End-to-end anti-spoofing with RawNet2](https://arxiv.org/abs/2011.01108) |
| 13 | +- [\[1808.00158\] Speaker Recognition from Raw Waveform with SincNet](https://arxiv.org/abs/1808.00158) |
| 14 | +- [SincConv in SE](https://arxiv.org/pdf/2403.01785) |
| 15 | +- [\[1709.01507\] Squeeze-and-Excitation Networks](https://arxiv.org/abs/1709.01507) |
| 16 | +- [\[1803.10963\] Attentive Statistics Pooling for Deep Speaker Embedding](https://arxiv.org/abs/1803.10963) |
| 17 | +- [\[2011.05189\] Supervised attention for speaker recognition](https://arxiv.org/abs/2011.05189) |
| 18 | +- [Enhancing speaker identification through reverberation modeling and cancelable techniques using ANNs | PLOS ONE](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0294235) |
| 19 | +- [ISCA Archive - Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification](https://www.isca-archive.org/interspeech_2022/zhang22j_interspeech.html#:~:text=title%20%20%20%20,1796) |
| 20 | +- [Voxceleb: Large-scale speaker verification in the wild](https://www.robots.ox.ac.uk/~vgg/publications/2019/Nagrani19/nagrani19.pdf) |
| 21 | +- [chung18a.pdf](https://www.robots.ox.ac.uk/~vgg/publications/2018/Chung18a/chung18a.pdf) |
| 22 | +- [nagrani17.pdf](https://www.robots.ox.ac.uk/~vgg/publications/2017/Nagrani17/nagrani17.pdf) |
| 23 | +- [\[2408.14886\] The VoxCeleb Speaker Recognition Challenge: A Retrospective](https://arxiv.org/abs/2408.14886) |
| 24 | +- [\[1912.07875\] Libri-Light: A Benchmark for ASR with Limited or No Supervision](https://arxiv.org/abs/1912.07875) |
| 25 | +- [\[2106.06909\] GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio](https://arxiv.org/abs/2106.06909) |
| 26 | +- [\[2101.00390\] VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation](https://arxiv.org/abs/2101.00390) |
| 27 | +- [2018_icassp_xvectors.pdf](https://www.danielpovey.com/files/2018_icassp_xvectors.pdf) |
| 28 | +- [\[2005.07143\] ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification](https://arxiv.org/abs/2005.07143) |
| 29 | +- [\[1706.08612\] VoxCeleb: a large-scale speaker identification dataset](https://arxiv.org/abs/1706.08612) |
| 30 | +- [\[2005.07143 (PDF)\]](https://arxiv.org/pdf/2005.07143) |
| 31 | +- [\[2401.17230v2\] ESPnet-SPK: full pipeline speaker embedding toolkit... (arXiv)](https://arxiv.org/abs/2401.17230v2) |
| 32 | +- [\[2401.17230v2 (PDF)\]](https://arxiv.org/pdf/2401.17230v2.pdf) |
| 33 | +- [\[2110.13900\] (PDF)](https://arxiv.org/pdf/2110.13900) |
| 34 | +- [\[2407.18223\] Reshape Dimensions Network for Speaker Recognition](https://arxiv.org/abs/2407.18223) |
| 35 | +- [Voxceleb: : Large-scale speaker verification in the wild: Computer Speech and Language: Vol 60, No C](https://dl.acm.org/doi/10.1016/j.csl.2019.101027) |
| 36 | +- [\[1912.02522\] VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge](https://arxiv.org/abs/1912.02522) |
| 37 | +- [\[1904.08779\] SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition](https://arxiv.org/abs/1904.08779) |
| 38 | +- [WavLM model ensemble for audio deepfake detection](https://arxiv.org/html/2408.07414v1) |
| 39 | + |
| 40 | +--- |
| 41 | + |
| 42 | +## Github |
| 43 | + |
| 44 | +- [unilm/wavlm at master · microsoft/unilm](https://github.com/microsoft/unilm/tree/master/wavlm) |
| 45 | +- [unilm/wavlm/modules.py at master · microsoft/unilm](https://github.com/microsoft/unilm/blob/master/wavlm/modules.py) |
| 46 | +- [libri-light/data_preparation/README.md at main · facebookresearch/libri-light](https://github.com/facebookresearch/libri-light/blob/main/data_preparation/README.md) |
| 47 | +- [bunyaminergen/WavLMMSDD](https://github.com/bunyaminergen/WavLMMSDD) |
| 48 | +- [Jungjee/RawNet: Official repository for RawNet, RawNet2, and RawNet3](https://github.com/Jungjee/RawNet) |
| 49 | +- [KrishnaDN/RawNet (implementation of RawNet paper)](https://github.com/KrishnaDN/RawNet) |
| 50 | +- [facebookresearch/libri-light](https://github.com/facebookresearch/libri-light) |
| 51 | +- [espnet/espnet (End-to-End Speech Processing Toolkit)](https://github.com/espnet/espnet) |
| 52 | +- [IDRnD/redimnet](https://github.com/IDRnD/redimnet/blob/master/EVALUATION.md) |
| 53 | +- [clovaai/voxceleb_trainer (In defence of metric learning for speaker recognition)](https://github.com/clovaai/voxceleb_trainer) |
| 54 | +- [IDRnD/redimnet: The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"](https://github.com/IDRnD/redimnet) |
| 55 | +- [kimho1wq/MR-RawNet: This repository contains official pytorch implementation and pre-trained models for the MR-RawNet.](https://github.com/kimho1wq/mr-rawnet) |
| 56 | + |
| 57 | +--- |
| 58 | + |
| 59 | +## Web |
| 60 | + |
| 61 | +- [Wavlm Base Sv · Models · Dataloop](https://dataloop.ai/library/model/microsoft_wavlm-base-sv/) |
| 62 | +- [Fine-tuning wav2vec2 for speaker recognition | Papers With Code](https://paperswithcode.com/paper/fine-tuning-wav2vec2-for-speaker-recognition) |
| 63 | +- [ESPnet-SPK: full pipeline... | Papers With Code](https://paperswithcode.com/paper/espnet-spk-full-pipeline-speaker-embedding) |
| 64 | +- [What is Equal Error Rate (EER)? | Webopedia](https://www.webopedia.com/definitions/equal-error-rate/) |
| 65 | +- [Performance for Speaker Identification (EER) - Stack Overflow](https://stackoverflow.com/questions/43315277/performance-for-speaker-identification-equal-error-rate-eer-and-identificati) |
| 66 | +- [Home (RawNet 2024)](https://sites.google.com/view/rawnet-2024/) |
| 67 | +- [ISCA Archive - Interspeech 2020 Jung20c (RawNet)](https://www.isca-archive.org/interspeech_2020/jung20c_interspeech.html) |
| 68 | +- [ResearchGate (RawNet paper)](https://www.researchgate.net/publication/335829649_RawNet_Advanced_End-to-End_Deep_Neural_Network_Using_Raw_Waveforms_for_Text-Independent_Speaker_Verification) |
| 69 | +- [Information Engineering (robots.ox.ac.uk)](https://www.robots.ox.ac.uk/) |
| 70 | +- [VoxCeleb: a large-scale speaker identification dataset | Papers With Code](https://cs.paperswithcode.com/paper/voxceleb-a-large-scale-speaker-identification) |
| 71 | +- [Full Text Search - Hugging Face (VoxCeleb)](https://huggingface.co/search/full-text?q=voxceleb&type=dataset) |
| 72 | +- [SITW_overlap.txt](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/SITW_overlap.txt) |
| 73 | +- [VoxCeleb Speaker Recognition Challenge](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/competition2019.html) |
| 74 | +- [The Speakers in the Wild (SITW) Speaker Recognition Database - SRI](https://www.sri.com/publication/speech-natural-language-pubs/the-speakers-in-the-wild-sitw-speaker-recognition-database/) |
| 75 | + |
| 76 | +--- |
| 77 | + |
| 78 | +## Hugging Face |
| 79 | + |
| 80 | +- [microsoft/wavlm-large](https://huggingface.co/microsoft/wavlm-large) |
| 81 | +- [microsoft/wavlm-base-plus-sv](https://huggingface.co/microsoft/wavlm-base-plus-sv) |
| 82 | +- [microsoft/wavlm-base](https://huggingface.co/microsoft/wavlm-base) |
| 83 | +- [WavLM Documentation](https://huggingface.co/docs/transformers/model_doc/wavlm#transformers.WavLMForXVector.config) |
| 84 | +- [What is Feature Extraction? - Hugging Face](https://huggingface.co/tasks/feature-extraction) |
| 85 | +- [Fine-Tune Wav2Vec2 for English ASR in Hugging Face with Transformers](https://huggingface.co/blog/fine-tune-wav2vec2-english) |
| 86 | +- [WavLMMSDD - a Hugging Face Space by bunyaminergen](https://huggingface.co/spaces/bunyaminergen/WavLMMSDD) |
| 87 | +- [jungjee/RawNet3](https://huggingface.co/jungjee/RawNet3) |
| 88 | +- [espnet/voxcelebs12_rawnet3](https://huggingface.co/espnet/voxcelebs12_rawnet3) |
| 89 | +- [openslr/librispeech_asr](https://huggingface.co/datasets/openslr/librispeech_asr) |
| 90 | +- [yangwang825/vox1-veri-full](https://huggingface.co/datasets/yangwang825/vox1-veri-full) |
| 91 | +- [yangwang825/vox1-iden-3s](https://huggingface.co/datasets/yangwang825/vox1-iden-3s) |
| 92 | +- [101arrowz/vox_celeb](https://huggingface.co/datasets/101arrowz/vox_celeb) |
| 93 | +- [TwinkStart/VoxCeleb](https://huggingface.co/datasets/TwinkStart/VoxCeleb) |
| 94 | + |
| 95 | +--- |
| 96 | + |
| 97 | +## Wikipedia |
| 98 | + |
| 99 | +- [Time delay neural network](https://en.wikipedia.org/wiki/Time_delay_neural_network) |
| 100 | + |
| 101 | +--- |
| 102 | + |
| 103 | +## Dataset |
| 104 | + |
| 105 | +- [Libri-light](https://ai.meta.com/tools/libri-light/) |
| 106 | +- [Libri-Light Dataset | Papers With Code](https://paperswithcode.com/dataset/libri-light) |
| 107 | +- [libri-light/data_preparation at main · facebookresearch/libri-light](https://github.com/facebookresearch/libri-light/tree/main/data_preparation) |
| 108 | +- [talkbank/callhome · Datasets at Hugging Face](https://huggingface.co/datasets/talkbank/callhome?row=1) |
| 109 | +- [VoxCeleb](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html) |
| 110 | +- [veri_test.txt (VoxCeleb1)](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt) |
| 111 | +- [VoxCeleb1 Dataset | Papers With Code](https://paperswithcode.com/dataset/voxceleb1) |
| 112 | +- [VoxCeleb Benchmark (Speaker Verification) | Papers With Code](https://paperswithcode.com/sota/speaker-verification-on-voxceleb) |
| 113 | +- [VoxCeleb1 Benchmark (Speaker Recognition) | Papers With Code](https://paperswithcode.com/sota/speaker-recognition-on-voxceleb1) |
| 114 | +- [The Speakers in the Wild (SITW) Speaker Recognition Database - SRI](https://www.sri.com/publication/speech-natural-language-pubs/the-speakers-in-the-wild-sitw-speaker-recognition-database/) |
| 115 | +- [SITW_overlap.txt (overlap list)](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/SITW_overlap.txt) |
| 116 | +- [VoxCeleb](https://mm.kaist.ac.kr/datasets/voxceleb/) |
| 117 | +- [KAIST MM](https://cn01.mmai.io/keyreq/voxceleb) |
| 118 | +- [Speaker Verification Datasets | Papers With Code](https://paperswithcode.com/datasets?q=Speaker+Verification&v=lst&o=match) |
| 119 | + |
| 120 | +--- |
| 121 | + |
| 122 | +## Youtube |
| 123 | + |
| 124 | +- [RawNet Explained + Code](https://www.youtube.com/watch?v=9lOkPtilD74) |
| 125 | + |
| 126 | +--- |
0 commit comments