A curated list of recent and important papers, tools, and resources on Diffusion Large Language Models (DLMs).
Contributions are welcome! Please submit a PR or open an issue. Your help makes this list even more awesome! π
- Google's Gemini diffusion
- Inception's Mercury
-
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs TLDR: First systematic study of long-context capabilities in diffusion LLMs. Proposes LongLLaDA, a training-free extrapolation method, and finds they outperform AR models on some long-context tasks.
-
MMaDA: Multimodal Large Diffusion Language Models TLDR: Introduces MMaDA, a unified multimodal diffusion model excelling in textual reasoning, visual understanding, and text-to-image generation. Outperforms several SOTA models across domains.
-
Large Language Diffusion Models TLDR: LLaDA is a Transformer-based diffusion LLM trained from scratch. Matches or beats strong AR baselines like LLaMA3 8B, especially in instruction following and reversal tasks.
-
Simple and Effective Masked Diffusion Language Models TLDR: Shows masked diffusion models can nearly match AR performance with simple training tricks and a Rao-Blackwellized objective. Supports semi-autoregressive generation.
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models TLDR: Proposes block diffusion models that combine the strengths of AR and diffusion methods. Enables flexible-length generation and improved efficiency using KV caching.
-
Scaling Diffusion Language Models via Adaptation from Autoregressive Models TLDR: Converts AR models like GPT-2 and LLaMA into diffusion models (DiffuGPT, DiffuLLaMA) using continual pretraining. Matches AR models on several benchmarks with less data.
-
Unifying Autoregressive and Diffusion-Based Sequence Generation TLDR: Proposes hyperschedules and hybrid noise processes to blend AR and diffusion models. Achieves strong perplexity and diverse generation while enabling error correction during decoding.
-
Denoising Diffusion Implicit Models TLDR: Introduces DDIMs, which speed up diffusion sampling by 10β50Γ using non-Markovian processes while maintaining high sample quality.
-
WIP
-
Dream(https://github.com/HKUNLP/Dream) TLDR: A 7B diffusion LLM that rivals leading AR models in performance, demonstrating the feasibility of scalable diffusion-based LLMs.
We wholeheartedly welcome contributions from the community! β¨
- π Add new papers: Discovered a groundbreaking paper? Share it with us!
- π§ͺ Share codebases or demos: Have a cool implementation or demo? Let's showcase it!
- π Submit tutorials, blog posts, or benchmarks: Help others learn and evaluate DLMs.
Feel free to open a pull request or issue! π
Maintained by Alessio Devoto