Awesome Large Language Diffusion Models 🧠

A curated list of recent and important papers, tools, and resources on Diffusion Large Language Models (DLMs).

Contributions are welcome! Please submit a PR or open an issue. Your help makes this list even more awesome! 💖

🤖 Open-Weight Models

Llada
Dream
MMaDA

💻 Closed-Weight Models

Google's Gemini diffusion
Inception's Mercury

🔥 Papers

LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs TLDR: First systematic study of long-context capabilities in diffusion LLMs. Proposes LongLLaDA, a training-free extrapolation method, and finds they outperform AR models on some long-context tasks.
MMaDA: Multimodal Large Diffusion Language Models TLDR: Introduces MMaDA, a unified multimodal diffusion model excelling in textual reasoning, visual understanding, and text-to-image generation. Outperforms several SOTA models across domains.
Large Language Diffusion Models TLDR: LLaDA is a Transformer-based diffusion LLM trained from scratch. Matches or beats strong AR baselines like LLaMA3 8B, especially in instruction following and reversal tasks.
Simple and Effective Masked Diffusion Language Models TLDR: Shows masked diffusion models can nearly match AR performance with simple training tricks and a Rao-Blackwellized objective. Supports semi-autoregressive generation.
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models TLDR: Proposes block diffusion models that combine the strengths of AR and diffusion methods. Enables flexible-length generation and improved efficiency using KV caching.
Scaling Diffusion Language Models via Adaptation from Autoregressive Models TLDR: Converts AR models like GPT-2 and LLaMA into diffusion models (DiffuGPT, DiffuLLaMA) using continual pretraining. Matches AR models on several benchmarks with less data.
Unifying Autoregressive and Diffusion-Based Sequence Generation TLDR: Proposes hyperschedules and hybrid noise processes to blend AR and diffusion models. Achieves strong perplexity and diverse generation while enabling error correction during decoding.
Denoising Diffusion Implicit Models TLDR: Introduces DDIMs, which speed up diffusion sampling by 10–50× using non-Markovian processes while maintaining high sample quality.
WIP

📚 Blogs

Dream(https://github.com/HKUNLP/Dream) TLDR: A 7B diffusion LLM that rivals leading AR models in performance, demonstrating the feasibility of scalable diffusion-based LLMs.
Peeking Inside Diffusion Language Models with LogitLens

🤝 Contributing to this Awesome List

We wholeheartedly welcome contributions from the community! ✨

📄 Add new papers: Discovered a groundbreaking paper? Share it with us!
🧪 Share codebases or demos: Have a cool implementation or demo? Let's showcase it!
🛠 Submit tutorials, blog posts, or benchmarks: Help others learn and evaluate DLMs.

Feel free to open a pull request or issue! 🙏

Maintained by Alessio Devoto

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome Large Language Diffusion Models 🧠

🤖 Open-Weight Models

💻 Closed-Weight Models

🔥 Papers

📚 Blogs

🤝 Contributing to this Awesome List

About

Uh oh!

alessiodevoto/awesome-diffusion-language-models

Folders and files

Latest commit

History

Repository files navigation

Awesome Large Language Diffusion Models 🧠

🤖 Open-Weight Models

💻 Closed-Weight Models

🔥 Papers

📚 Blogs

🤝 Contributing to this Awesome List

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks