This is a curated repository of materials for deep learning.
- Courses
- Tutorials
- Curated lists
- Papers
- Articles
- Books
- Reports
- Models
- Benchmarks
- Datasets
- Tools
- APIs
- Initiatives
- Lectures
- Implementations
- Introduction to Deep Learning (MIT)
- Introduction to Deep Learning (Carnegie Mellon)
- Deep Learning (Stanford)
- Mathematical Engineering of Deep Learning
- Generative AI for beginners
- Stanford Cheatsheets:
- Courses by Sebastian Raschka:
- Prompt Engineering Guide
- Foundations of Deep Learning (University of Maryland)
- K-12 Prompt Engineering Guide
- Machine Learning L Marti, N Sanchez-Pi
- Recent Advances on Foundation Models
- Build a Large Language Model (From Scratch)
- GPT in 60 lines of NumPy
- Implementing Mixtral with Neural Circuit Diagrams
- Hello Evo: From DNA generation to protein folding
- University of Amsterdam Deep Learning Tutorials web Github
- Zero to LitGPT: Getting Started with Pretraining, Finetuning, and Using LLMs
- Annotated Deep Learinng Research Papers Implementations web Github
- llamafile is the new best way to run a LLM on your own computer
- Bash One-Liners for LLMs
- A primer on algorithmic differentiation
- Creating a Transformer From Scratch - Part One: The Attention Mechanism
- Creating a Transformer From Scratch - Part Two: The Rest of the Transformer
- The Annotated Diffusion Model
- Defusing Diffusion Models
- The Illustrated Stable Diffusion
- build nanoGPT by Andrey Karpathy Github Youtube
- Let's build GPT from scratch by Andrey Karpathy Youtube
- AI by hand Substack Twitter
- The Matrix Calculus You Need For Deep Learning
- CNN from Scratch with pure Mathematical Intuition
- Machine Learning from Scratch
- The Incredible PyTorch
- Statistics and machine learning: from undergraduate to research
- Awesome Graph-Related Large Language Models
- Deep Learning Drizzle
- Deep Learning Papers Reading Roadmap
- Awesome Interpretability in Large Language Models
- Role play with large language models (2023) Nature.com arXiv
- Attention is All You Need (2017) arXiv v7 (2023)
- OLMo: Accelerating the Science of Language Models (2024) arXiv
- Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (2024) blog paper
- Datasets for Large Language Models: A Comprehensive Survey (2024) arXiv
- Large Language Models(LLMs) on Tabular Data (2024) arXiv
- Chain-of-Thought Reasoning Without Prompting (2024) arXiv
- Holistic Evaluation of Language Models (2022) arXiv
- Understanding LLMs: A Comprehensive Overview from Training to Inference (2024) arXiv
- Large Language Models: A Survey (2024) arXiv
- Grandmaster-Level Chess Without Search (2024) arXiv
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection (2024) arXiv Github
- MacGyver: Are Large Language Models Creative Problem Solvers? (2024) arXiv Github
- Revisiting Unreasonable Effectiveness of Data in Deep Learning Era (2017) paper
- Understanding Deep Learning Requires Rethinking Generalization (2017) arXiv
- Understanding Deep Learning (Still) Requires Rethinking Generalization (2021) paper
- Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets (2022) arXiv
- Deep Double Descent: Where Bigger Models and More Data Hurt (2019) arXiv
- Unifying Grokking and Double Descent (2023) arXiv
- Textbooks Are All You Need (2023) arXiv
- LoRa: Low-Rank Adaptation of Large Language Models (2021) arXiv
- Fundamental Components of Deep Learning: A category-theoretic approach (2023) arXiv
- Compression Represents Intelligence Linearly arXiv HuggingFace
- Rules of Machine Learning - Best Practices for ML Engineering web
- General Intelligence Requires Rethinking Exploration (2024) arXiv
- Testing theory of mind in large language models and humans (2024) paper
- Diffusion Models Are Real-Time Game Engines web
- Compact Language Models via Pruning and Knowledge Distillation paper
- RLHF: Reinforcement Learning from Human Feedback
- How do LLMs give truthful answers?
- Models All the Way Down
- Large language models can do jaw-dropping things. But nobody knows exactly why
- How Chain-of-Thought Reasoning Helps Neural Networks Compute
- Will Transformers Take Over Artificial Intelligence?
- The Hidden Convex Optimization Landscape of Two-Layer ReLU Networks
- Building Diffusion Model's theory from ground up
- Generative AI exists because of the transformer
- The "it" in AI models is the dataset
- Build what you need and use what you build - An approach to impactful science
- I. From GPT-4 to AGI: Counting the OOMs
- The Roadmap of Mathematics for Machine Learning
- Understanding Deep Learning (2023)
- Deep Learning
- Dive into Deep Learning
- Hands-On Machine Learning with R
- Interpretable Machine Learing
- Supervised Machine Learning for Science
- Machine Learning Engineering Open Book
- Little Book of Deep Learning
- Geometric Deep Learning
- Foundations of Vector Retrieval
- Alice's adventures in differentiable wonderland
- The Elements of Differentiable Programming
- Modeling Mindsets
- Practical Deep Learning for Coders
- Machine Learning - The Basics
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
- Reinforcement Learning from Human Feedback Book
- Foundations of Large Language Models
- The Big Book of Large Language Models
- The State of Competitive Machine Learning (2023)
- Science in the age of AI (2024)
- Thousands of AI authors on the future AI (2024)
- Moondream - tiny vision language model web Github
- BioMistral - pretrained LLM models for biomedical domain arXiv HuggingFace
- StructLM - Generalist Model for Structured Knowledge Grounding web arXiv Github HuggingFace
- Gemma - Open Models Based on Gemini Research and Technology HuggingFace paper
- Large World Model web Github HuggingFace arXiv
- ChemLLM: A Chemical Large Language Model arXiv HuggingFace
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model arXiv HuggingFace web
- Llama 3 web Github
- Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research arXiv HuggingFace
- Internet Archive Public Domain English Books HuggingFace
- Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models web arXiv
- Cosmopedia - dataset of synthetic textbooks, blogposts, stories, posts and WikiHow articles web HuggingFace
- FineWeb - 15T tokens of deduplicated and English texts from CommonCrawl HuggingFace
- Ollama - run Llama 2, Mistral and Gemma locally web Github
- llama.cpp - standalone LLM inference in C/C++ for Llama models Github
- gemma.cpp - standalone LLM inference in c/C++ for Gemma models Github
- DSPy - framework for programming foundational models arXiV Github
- LangChain - framework for buidling LLM applications web Github
- trl - Transformer Reinforcement Learning Github
- Transformers - State-of-the-art Machine Learning for PyTorch, TensorFlow, and JAX Github HuggingFace
- Transformers.js - State-of-the-art Machine Learning for the web HuggingFace
- Tokenizer Playground
- Neural Networks Playground
- MelloTTS - multi-lingual text-to-speech library HuggingFace demo Github
- Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory! Github
- llm.c - LLM (GPT-2) training in simple, pure C/CUDA Github
- LLM transparency tool Github
- CoreNet - Apple's library for deep learning Github
- Cohere toolkit - quickly deploy RAG applications Github
- wllama - WebAssembly bindings for llama.cpp Github
- elia - LLM in terminal Github
- MiniTorch Github
- Occiglot - research collective for open-source European LLMs web HuggingFace
- Open Sora Github
- A little guide to building Large Language Models in 2024 Youtube slides
- Intro to LLMs by Andrey Karpathy Youtube slides
- The spelled-out intro to neural networks and backpropagation: building micrograd by Andrey Karpathy Youtube
- Attention in transformers, visually explained (Chapter 6, Deep Learning) Youtube
- Intuition Behind Self-Attention Mechanism in Transformer Networks Youtube
- Pytorch implementations of various papers github