A Structured Path for Learning AI / ML Research & Engineering

I'm developing this course to contain everything you need to:

Join elite labs like OpenAI, Google, or MIT
Independently publish groundbreaking open-source research
Build world-class models

📺 My YouTube Videos

Follow along with course walkthroughs, video tutorials, and explanations.

⚡ Speedruns

For advanced learners, check out the Speedruns: Research / engineering / optimization challenges that help you:

Contribute to open-source
Build real skills by doing

📚 Course

📘 Introduction & Motivation

🐍 Don't Know Python?

Start with the Beginner Python Course to get up to speed.

🗂️ Tip

Make a copy of the notebooks: Open notebook → File → Save a copy to Dive

Fundamentals (Most important)

Intro course - Deep Learning by Professor Bryce - YouTube
PyTorch Fundamentals: From Linear Layers & Weight Intuition to LayerNorm, Variance, and Custom ML Blocks - Google Colab - YouTube - Bilibili
Code Softmax, Cross-Entropy, and Gradients — From Scratch (No Torch) (In development) - Googe Colab

Backpropagation
Chain Rule & Backpropagation From Scratch Google Colab

Matrix Multiplication
Comparing MatMul: PyTorch Native vs Tiling vs Quantization (In development) - Google Colab
Make Matrix Multiply 3x Faster by Padding Size to Power of 2 - Google Colab
How Matrix Shape Affects Performance on Nvidia T4 Tensor Cores - (in development) - Google Colab
TODO: how to optimize matmuls on specific GPUs

Training LLMs From Scratch
Experimenting With Small Character-Level LLM: Hyperparameters, Optimization, and Model Scaling - Paper - Google Colab
Train a Small LLM From Scratch In 50 Min - Google Colab

Diffusion Models
Simplest diffusion model to generate points on a circle - Google Colab
Code & train a small diffusion model to calculate A mod B - Google Colab

My Research & Research Tools
How To Do LLM Research In Google Colab - GitHub - YouTube

Other (important) Models
Understand Simple Autoencoder - Google Colab

I had no idea autoencoders are so quick to train, a few seconds for autoencoder of numbers (0-10,000):

Encoder takes a number (56) -> vector embedding [0.3, 0.7, 0.42,...] -> decoder aims to predict the encoded number (56) from the vector embedding - these vector embeddings contain rich representation of the encoded number (token, sentence,...) that can be used in a models like LLMs, diffusion,...

I'm figuring out autoencoders as I think LLMs should process sentences, not tokens, as sentences can represent infinite number of concepts, as opposed to limited token vocabulary (usually about 150K)

Predicting over infinite distribution requires diffusion models (like seemingly infinite number of possible images), as autoregressive would just predict the blury average of the image, sentence, without any meaning.

Also diffusion model allows us to have truly unified training in the same latent space for visual and text data.

High Performance on Hopper GPUs (H100, H200, H800)

TMA (Tensor Memory Accelerator) alignment for fast memory on Hopper GPUs (DeepSeek's speed) - Google Colab
High-Performance GPU Matrix Multiplication on H800, H100 & H200 from Scratch - Google Colab

Fun experiments

Looking for patterns in trained neural network weights - Google Colab - Preview PDF Analysis In development

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
001 Introduction & Motivation		001 Introduction & Motivation
010 Programming Fundamentals/010 Python Basics		010 Programming Fundamentals/010 Python Basics
beginner-course		beginner-course
speedruns		speedruns
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Structured Path for Learning AI / ML Research & Engineering

📺 My YouTube Videos

⚡ Speedruns

📚 Course

🐍 Don't Know Python?

🗂️ Tip

Fundamentals (Most important)

Backpropagation

Matrix Multiplication

Training LLMs From Scratch

Diffusion Models

My Research & Research Tools

Other (important) Models

High Performance on Hopper GPUs (H100, H200, H800)

Fun experiments

About

Uh oh!

Releases

Packages

Languages

vukrosic/ultimate-ai-research-and-engineering-course

Folders and files

Latest commit

History

Repository files navigation

A Structured Path for Learning AI / ML Research & Engineering

📺 My YouTube Videos

⚡ Speedruns

📚 Course

🐍 Don't Know Python?

🗂️ Tip

Fundamentals (Most important)

Backpropagation

Matrix Multiplication

Training LLMs From Scratch

Diffusion Models

My Research & Research Tools

Other (important) Models

High Performance on Hopper GPUs (H100, H200, H800)

Fun experiments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages