Replies: 1 comment 1 reply
-
Please follow the troubleshooting guide to debug this. If it still doesn't work, please open a GH issue so we can investigate further. (GH discussions aren't reviewed nearly as much) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Run cmd:
After loading the weight to mem, it will take a very very long time to run the server, and for a 8b fp8 model, it took around 30GB mem after loading the weight. Then it will freeze for a long time then failed. I have tried to remove
--quantization fp8
option, nothing changed.It is running in a devcontainer in WSL2, I successfully start the server only once. Not sure what happend.
This is the configration of devcontainer:
This is the nv driver information in WSL2

vLLM API server version 0.6.6.post1
Beta Was this translation helpful? Give feedback.
All reactions