vllm-top is a Python package for monitoring and displaying metrics from the vLLM service. It provides a comprehensive dashboard to visualize both current state and historical performance, making it easy to track and analyze service behavior over time.
- Task State Visibility: Instantly see GPU Cache Usage, Running and Waiting requests to help debug bottlenecks and improve throughput.
- Minimalist Monitoring: Lightweight dashboard that parses metrics directly from Prometheus.
- Quick Setup: No extra configuration — just pip install and run.
Install via pip:
pip install vllm-top
Start monitoring:
vllm-top
Change update interval (in seconds):
vllm-top --interval 5
Get a one-time snapshot:
vllm-top --snapshot
Contributions are welcome! Please submit a pull request or open an issue for enhancements or bug fixes.
Licensed under the MIT License. See the LICENSE file for details.
See CHANGELOG.md for a detailed