Replies: 1 comment 7 replies
-
You should see an OOM if the |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
As I know, vllm will launch a profile_run https://github.com/vllm-project/vllm/blob/main/vllm/worker/worker.py#L174 to measure the peak memory usage of the model using the maximum seq length.
However, I don't see any OOM error if the gpu memory is not enough to fit the
model_max_len
context length.How does the profile_run avoid oom issue when running the model with dummy input?
Looking forward to getting an answer from the community. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions