Replies: 1 comment 5 replies
-
Thanks for the question. Yes we have compared the performance with FasterTransformer in our research paper (will be released soon). We can achieve up to a up to 22x speedup compared to FasterTransformer. The main gain comes from the PagedAttention and continuous batching implemented in vLLM. |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Have you compared with https://github.com/NVIDIA/FasterTransformer and NVIDIA/FasterTransformer#506 ?
Beta Was this translation helpful? Give feedback.
All reactions