Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
assignments		assignments
group-projects		group-projects
lectures		lectures
README.md		README.md

Repository files navigation

Speech Recognition and Generation course for ITMO AI Talent Hub

Course materials are available in English in this repo
All the lectures are given in Russian

Course structure

10+ lectures
2 personal assignments [50 pts]
2 group projects [50 pts]
1 research seminar [10 pts]

All courseworks deadlines are provided within their corresponding descriptions

Syllabus

This is a new course that is given for the first time, so the syllabus is subject to slight modifications during the course.

Basics of Digital Signal Processing
Classic ASR and metrics
End-to-End ASR with CTC and audio augmentations
Encoder-Decoder End-to-End ASR and decoding with LM
Self-supervised speech representations
SSL-finetuned ASR and Whisper
Text-to-Speech systems
Neural vocoders
Modern TTS with normalizing flows and diffusion
Neural Codec Language Models: VALL-E
Extra: Speaker recognition and speech inpainting

Resources

About

ITMO AI Talent Hub Speech Recognition and Generation course

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%