- Course materials are available in English in this repo
- All the lectures are given in Russian
- 10+ lectures
- 2 personal assignments [50 pts]
- 2 group projects [50 pts]
- 1 research seminar [10 pts]
All courseworks deadlines are provided within their corresponding descriptions
This is a new course that is given for the first time, so the syllabus is subject to slight modifications during the course.
- Basics of Digital Signal Processing
- Classic ASR and metrics
- End-to-End ASR with CTC and audio augmentations
- Encoder-Decoder End-to-End ASR and decoding with LM
- Self-supervised speech representations
- SSL-finetuned ASR and Whisper
- Text-to-Speech systems
- Neural vocoders
- Modern TTS with normalizing flows and diffusion
- Neural Codec Language Models: VALL-E
- Extra: Speaker recognition and speech inpainting