[Roadmap] vLLM Roadmap Q3 2025

This page is accessible via [roadmap.vllm.ai](https://roadmap.vllm.ai/)

This is a living document! For each item here, we intend to link the RFC as well as discussion Slack channel in the [vLLM Slack](https://slack.vllm.ai)

---

## Core Themes

*In Q3, we continue to iterate towards vLLM 1.0 by fully removing the V0 code path, optimizing and extending the core scheduler, making sure vLLM can serve the most demanding workloads of the world, and enhancing the out of box usability and performance.* 

### V1 Engine

* [ ] V0 Features Parity and Native Features (\#sig-v1)  
  * [x] Pooling Model  
  * [x] Mamba Model (https://github.com/vllm-project/vllm/pull/19327)
  * [x] Priority Scheduling (https://github.com/vllm-project/vllm/pull/19057)
  * [ ] Custom Logits Processing  
  * [ ] CPU KV Cache  
  * [ ] Investigate Encoder-Decoder Support (https://github.com/vllm-project/vllm/pull/21088)
* [ ] Performance   
  * [ ] Async Scheduling  
  * [ ] Optimize Input Preparation (Persistent Batch V2)  
  * [ ] Speculative Decoding Enhancements (Suffix Decoding, CUDA Graph/torch.compile support)  
  * [ ] Multimodal Processing  
    * [ ] Simplification  
    * [ ] Parallel Input Processing  
    * [ ] Reduce Serialization & Broadcasting Overheads  
    * [ ] Investigate streaming input and output  
* [ ] Design Documentation  
  * [ ] Hybrid Memory Allocator  
  * [ ] Core Scheduler Design  
  * [ ] Speculative Decoding Design  
  * [ ] Hardware Platform Guide  
  * [ ] Model API Guide

### User Experience

* [ ] Fit and Finish (\#feat-startup-ux)  - https://github.com/vllm-project/vllm/issues/19824
  * [ ] Fast startup  
  * [ ] Clean startup log  
  * [ ] Clean up configuration items  
  * [ ] Performance Tuning Guide  
  * [ ] Accelerator UX Audit and Document Feature Coverage  
* [ ] Stability and Testing  
  * [ ] Comprehensive Reproducible Performance Suite    
  * [ ] Enhance and Report Accuracy Suite  
  * [ ] Large Scale Deployment Tested in CI  
  * [ ] Stress and Longevity Testing  
  * [ ] Improve the Stability of the vLLM-torch.compile Integration.  
  * [ ] Robust Tool Use Parsing
* [ ] Operational Experience  
  * [ ] Request Level SLO Targeting & Enhanced Autoscaling/Tuning  
  * [ ] Improve Logging and Tracing Code Path  
  * [ ] Debugging tool for perf profiling and numerics

### Large Scale Serving

* [ ] Stable Scale-Out Serving for Mixture-of-Experts Models  
  * [ ] Enhance and Document Data Parallelism  
  * [ ] Stabilize Expert Parallelism Routing x GEMM Options   
  * [x] Expert Parallel Load Balancing (https://github.com/vllm-project/vllm/pull/18343)
  * [ ] Transfer KV Cache through CPU  
  * [ ] Communication & Computation Overlap  
* [ ] Disaggregated Serving   
  * [ ] Standardized Dataflow between P \<\> D  
  * [ ] Autoscaling P & D Replicas  
  * [ ] Multi-modality Support  
  * [ ] Speculative Decoding Support  
  * [ ] Prefill-Only Mode  
* [ ] Elastic EP and Fault Tolerance  
* [ ] Enhancement to the KV Transfer API for Request Migration and KV Cache Priming  
* [ ] Investigate Context Parallelism

## Features

### Models

* [ ] Support Multiple Training and Model Authoring Frameworks by Opening up the Interface for Tokenizer, Configuration, and Processor.   
* [ ] Investigate Sparse Attention Mechanism   
* [ ] Performance Enhancements for Small Models (\<1B scale)

### Hardwares

* [ ] NVIDIA  
  * [ ] Enhance Blackwell Support  
  * [ ] GB200 NVL72   
* [ ] AMD  
  * [ ] MI350X: MXFP4 Support  
  * [ ] Large Scale Serving Support  
  * [ ] Official Wheels and Container Distributed  
  * [ ] Document Feature Party and Performance Numbers  
* [ ] TPU  
  * [ ] Progress in Ironwood Support  
  * [ ] Official Wheels and Container Distributed  
  * [ ] Document Feature Party and Performance Numbers  
* [ ] Neuron  
  * [ ] Plugin for V1 Architecture  
  * [ ] Document Feature Party and Performance Numbers  
* [ ] Intel  
  * [ ] Stable CPU Release with Wheels and Containers  
  * [ ] Stable XPU Support   
  * [ ] HPU (Gaudi) Move to Plugin  
* Platform Plugins  
  * [ ] Stable and Tested Interfaces

### Use Cases

* [ ] RLHF  
  * [ ] Test Popular Framework that Integrate with vLLM for Performance and Prevent Breakages   
  * [ ] Weight loading optimization for syncing and resharding  
  * [ ] Custom checkpoint loader, custom model format  
  * [ ] Multi-turn scheduling  
* [ ] Evaluation  
  * [ ] Support Full Determinism (with/without prefix cache) Regardless of Batching Order  
* [ ] Batch Inference  
  * [ ] Simple Data Parallel Router for Scale Out with Prefix Caching  
  * [ ] CPU KV cache offloading  
* [ ] Explore Bundling of Configuration for Specializations  
  * [ ] Low Latency Code Completion  
  * [ ] High Throughput Multi-turn Agentic Rollout  
  * [ ] Large Scale Image and Video Understanding   
  * [ ] Transformer Based Item Recommendation

### FAQ

*When will vLLM release 1.0?*  
We believe 1.0 means API stability and great user experience. We will not commit a date for the exact release. The criteria for 1.0 are: 

* Stable user facing API such as CLI, LLM, AsyncLLMEngine .  
* Stable developer API such as logits processor, kv connector, model interface, hardware/platform plugin interfaces  
* Polished out-of-box user experience.
---

If any of the items you wanted is not on the roadmap, your suggestion and contribution is strongly welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.

Historical Roadmap: #15735, #11862, #9006, #5805, #3861, #2681, #244  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Roadmap] vLLM Roadmap Q3 2025 #20336

Core Themes

V1 Engine

User Experience

Large Scale Serving

Features

Models

Hardwares

Use Cases

FAQ

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Roadmap] vLLM Roadmap Q3 2025 #20336

Description

Core Themes

V1 Engine

User Experience

Large Scale Serving

Features

Models

Hardwares

Use Cases

FAQ

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions