Replies: 1 comment 1 reply
-
can you explain what decoupled mode mean? also feel free to write the question in Chinese. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Sorry I'm not an English speaker, so forgive my poor English.
I want to use Triton as our model inference server and vLLM as a backend. But since vLLM's triton backend only support decoupled mode. Is it possible to implement none-decoupled mode myself? Is there anything I should be aware of?
Beta Was this translation helpful? Give feedback.
All reactions