Skip to content

Commit 6e550ed

Browse files
committed
Initial
0 parents  commit 6e550ed

33 files changed

+5456
-0
lines changed

.docs/documentation/CONTRIBUTING.md

Lines changed: 506 additions & 0 deletions
Large diffs are not rendered by default.

.docs/documentation/RESOURCES.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# Resources
2+
3+
---
4+
5+
## Paper
6+
7+
- [\[2110.13900\] WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900)
8+
- [\[1904.08104\] RawNet: Advanced end-to-end deep neural network for speaker verification](https://arxiv.org/abs/1904.08104)
9+
- [Improved RawNet with Feature Map Scaling for Text-Independent Speaker Verification Using Raw Waveforms (ISCA Interspeech 2020)](https://www.isca-archive.org/interspeech_2020/jung20c_interspeech.html)
10+
- [\[2203.08488\] Pushing the limits of raw waveform speaker recognition](https://arxiv.org/abs/2203.08488)
11+
- [\[2406.07103\] MR-RawNet: multiple temporal resolutions for variable duration utterances](https://arxiv.org/abs/2406.07103)
12+
- [\[2011.01108\] End-to-end anti-spoofing with RawNet2](https://arxiv.org/abs/2011.01108)
13+
- [\[1808.00158\] Speaker Recognition from Raw Waveform with SincNet](https://arxiv.org/abs/1808.00158)
14+
- [SincConv in SE](https://arxiv.org/pdf/2403.01785)
15+
- [\[1709.01507\] Squeeze-and-Excitation Networks](https://arxiv.org/abs/1709.01507)
16+
- [\[1803.10963\] Attentive Statistics Pooling for Deep Speaker Embedding](https://arxiv.org/abs/1803.10963)
17+
- [\[2011.05189\] Supervised attention for speaker recognition](https://arxiv.org/abs/2011.05189)
18+
- [Enhancing speaker identification through reverberation modeling and cancelable techniques using ANNs | PLOS ONE](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0294235)
19+
- [ISCA Archive - Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification](https://www.isca-archive.org/interspeech_2022/zhang22j_interspeech.html#:~:text=title%20%20%20%20,1796)
20+
- [Voxceleb: Large-scale speaker verification in the wild](https://www.robots.ox.ac.uk/~vgg/publications/2019/Nagrani19/nagrani19.pdf)
21+
- [chung18a.pdf](https://www.robots.ox.ac.uk/~vgg/publications/2018/Chung18a/chung18a.pdf)
22+
- [nagrani17.pdf](https://www.robots.ox.ac.uk/~vgg/publications/2017/Nagrani17/nagrani17.pdf)
23+
- [\[2408.14886\] The VoxCeleb Speaker Recognition Challenge: A Retrospective](https://arxiv.org/abs/2408.14886)
24+
- [\[1912.07875\] Libri-Light: A Benchmark for ASR with Limited or No Supervision](https://arxiv.org/abs/1912.07875)
25+
- [\[2106.06909\] GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio](https://arxiv.org/abs/2106.06909)
26+
- [\[2101.00390\] VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation](https://arxiv.org/abs/2101.00390)
27+
- [2018_icassp_xvectors.pdf](https://www.danielpovey.com/files/2018_icassp_xvectors.pdf)
28+
- [\[2005.07143\] ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification](https://arxiv.org/abs/2005.07143)
29+
- [\[1706.08612\] VoxCeleb: a large-scale speaker identification dataset](https://arxiv.org/abs/1706.08612)
30+
- [\[2005.07143 (PDF)\]](https://arxiv.org/pdf/2005.07143)
31+
- [\[2401.17230v2\] ESPnet-SPK: full pipeline speaker embedding toolkit... (arXiv)](https://arxiv.org/abs/2401.17230v2)
32+
- [\[2401.17230v2 (PDF)\]](https://arxiv.org/pdf/2401.17230v2.pdf)
33+
- [\[2110.13900\] (PDF)](https://arxiv.org/pdf/2110.13900)
34+
- [\[2407.18223\] Reshape Dimensions Network for Speaker Recognition](https://arxiv.org/abs/2407.18223)
35+
- [Voxceleb: : Large-scale speaker verification in the wild: Computer Speech and Language: Vol 60, No C](https://dl.acm.org/doi/10.1016/j.csl.2019.101027)
36+
- [\[1912.02522\] VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge](https://arxiv.org/abs/1912.02522)
37+
- [\[1904.08779\] SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition](https://arxiv.org/abs/1904.08779)
38+
- [WavLM model ensemble for audio deepfake detection](https://arxiv.org/html/2408.07414v1)
39+
40+
---
41+
42+
## Github
43+
44+
- [unilm/wavlm at master · microsoft/unilm](https://github.com/microsoft/unilm/tree/master/wavlm)
45+
- [unilm/wavlm/modules.py at master · microsoft/unilm](https://github.com/microsoft/unilm/blob/master/wavlm/modules.py)
46+
- [libri-light/data_preparation/README.md at main · facebookresearch/libri-light](https://github.com/facebookresearch/libri-light/blob/main/data_preparation/README.md)
47+
- [bunyaminergen/WavLMMSDD](https://github.com/bunyaminergen/WavLMMSDD)
48+
- [Jungjee/RawNet: Official repository for RawNet, RawNet2, and RawNet3](https://github.com/Jungjee/RawNet)
49+
- [KrishnaDN/RawNet (implementation of RawNet paper)](https://github.com/KrishnaDN/RawNet)
50+
- [facebookresearch/libri-light](https://github.com/facebookresearch/libri-light)
51+
- [espnet/espnet (End-to-End Speech Processing Toolkit)](https://github.com/espnet/espnet)
52+
- [IDRnD/redimnet](https://github.com/IDRnD/redimnet/blob/master/EVALUATION.md)
53+
- [clovaai/voxceleb_trainer (In defence of metric learning for speaker recognition)](https://github.com/clovaai/voxceleb_trainer)
54+
- [IDRnD/redimnet: The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"](https://github.com/IDRnD/redimnet)
55+
- [kimho1wq/MR-RawNet: This repository contains official pytorch implementation and pre-trained models for the MR-RawNet.](https://github.com/kimho1wq/mr-rawnet)
56+
57+
---
58+
59+
## Web
60+
61+
- [Wavlm Base Sv · Models · Dataloop](https://dataloop.ai/library/model/microsoft_wavlm-base-sv/)
62+
- [Fine-tuning wav2vec2 for speaker recognition | Papers With Code](https://paperswithcode.com/paper/fine-tuning-wav2vec2-for-speaker-recognition)
63+
- [ESPnet-SPK: full pipeline... | Papers With Code](https://paperswithcode.com/paper/espnet-spk-full-pipeline-speaker-embedding)
64+
- [What is Equal Error Rate (EER)? | Webopedia](https://www.webopedia.com/definitions/equal-error-rate/)
65+
- [Performance for Speaker Identification (EER) - Stack Overflow](https://stackoverflow.com/questions/43315277/performance-for-speaker-identification-equal-error-rate-eer-and-identificati)
66+
- [Home (RawNet 2024)](https://sites.google.com/view/rawnet-2024/)
67+
- [ISCA Archive - Interspeech 2020 Jung20c (RawNet)](https://www.isca-archive.org/interspeech_2020/jung20c_interspeech.html)
68+
- [ResearchGate (RawNet paper)](https://www.researchgate.net/publication/335829649_RawNet_Advanced_End-to-End_Deep_Neural_Network_Using_Raw_Waveforms_for_Text-Independent_Speaker_Verification)
69+
- [Information Engineering (robots.ox.ac.uk)](https://www.robots.ox.ac.uk/)
70+
- [VoxCeleb: a large-scale speaker identification dataset | Papers With Code](https://cs.paperswithcode.com/paper/voxceleb-a-large-scale-speaker-identification)
71+
- [Full Text Search - Hugging Face (VoxCeleb)](https://huggingface.co/search/full-text?q=voxceleb&type=dataset)
72+
- [SITW_overlap.txt](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/SITW_overlap.txt)
73+
- [VoxCeleb Speaker Recognition Challenge](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/competition2019.html)
74+
- [The Speakers in the Wild (SITW) Speaker Recognition Database - SRI](https://www.sri.com/publication/speech-natural-language-pubs/the-speakers-in-the-wild-sitw-speaker-recognition-database/)
75+
76+
---
77+
78+
## Hugging Face
79+
80+
- [microsoft/wavlm-large](https://huggingface.co/microsoft/wavlm-large)
81+
- [microsoft/wavlm-base-plus-sv](https://huggingface.co/microsoft/wavlm-base-plus-sv)
82+
- [microsoft/wavlm-base](https://huggingface.co/microsoft/wavlm-base)
83+
- [WavLM Documentation](https://huggingface.co/docs/transformers/model_doc/wavlm#transformers.WavLMForXVector.config)
84+
- [What is Feature Extraction? - Hugging Face](https://huggingface.co/tasks/feature-extraction)
85+
- [Fine-Tune Wav2Vec2 for English ASR in Hugging Face with Transformers](https://huggingface.co/blog/fine-tune-wav2vec2-english)
86+
- [WavLMMSDD - a Hugging Face Space by bunyaminergen](https://huggingface.co/spaces/bunyaminergen/WavLMMSDD)
87+
- [jungjee/RawNet3](https://huggingface.co/jungjee/RawNet3)
88+
- [espnet/voxcelebs12_rawnet3](https://huggingface.co/espnet/voxcelebs12_rawnet3)
89+
- [openslr/librispeech_asr](https://huggingface.co/datasets/openslr/librispeech_asr)
90+
- [yangwang825/vox1-veri-full](https://huggingface.co/datasets/yangwang825/vox1-veri-full)
91+
- [yangwang825/vox1-iden-3s](https://huggingface.co/datasets/yangwang825/vox1-iden-3s)
92+
- [101arrowz/vox_celeb](https://huggingface.co/datasets/101arrowz/vox_celeb)
93+
- [TwinkStart/VoxCeleb](https://huggingface.co/datasets/TwinkStart/VoxCeleb)
94+
95+
---
96+
97+
## Wikipedia
98+
99+
- [Time delay neural network](https://en.wikipedia.org/wiki/Time_delay_neural_network)
100+
101+
---
102+
103+
## Dataset
104+
105+
- [Libri-light](https://ai.meta.com/tools/libri-light/)
106+
- [Libri-Light Dataset | Papers With Code](https://paperswithcode.com/dataset/libri-light)
107+
- [libri-light/data_preparation at main · facebookresearch/libri-light](https://github.com/facebookresearch/libri-light/tree/main/data_preparation)
108+
- [talkbank/callhome · Datasets at Hugging Face](https://huggingface.co/datasets/talkbank/callhome?row=1)
109+
- [VoxCeleb](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html)
110+
- [veri_test.txt (VoxCeleb1)](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt)
111+
- [VoxCeleb1 Dataset | Papers With Code](https://paperswithcode.com/dataset/voxceleb1)
112+
- [VoxCeleb Benchmark (Speaker Verification) | Papers With Code](https://paperswithcode.com/sota/speaker-verification-on-voxceleb)
113+
- [VoxCeleb1 Benchmark (Speaker Recognition) | Papers With Code](https://paperswithcode.com/sota/speaker-recognition-on-voxceleb1)
114+
- [The Speakers in the Wild (SITW) Speaker Recognition Database - SRI](https://www.sri.com/publication/speech-natural-language-pubs/the-speakers-in-the-wild-sitw-speaker-recognition-database/)
115+
- [SITW_overlap.txt (overlap list)](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/SITW_overlap.txt)
116+
- [VoxCeleb](https://mm.kaist.ac.kr/datasets/voxceleb/)
117+
- [KAIST MM](https://cn01.mmai.io/keyreq/voxceleb)
118+
- [Speaker Verification Datasets | Papers With Code](https://paperswithcode.com/datasets?q=Speaker+Verification&v=lst&o=match)
119+
120+
---
121+
122+
## Youtube
123+
124+
- [RawNet Explained + Code](https://www.youtube.com/watch?v=9lOkPtilD74)
125+
126+
---

0 commit comments

Comments
 (0)