Build vllm from source with CPU support #6188
parkesorgua
announced in
Q&A
Replies: 2 comments
-
I think vLLM still does not support inference for quantized models on CPUs, I'm also looking for the same, I was able to run llama3 8b bf16 model following the instructions from the docs, I did setup a docker instance and able get response from the model. If you have any info about quantized models running on vLLM, please let me know !! Thanks |
Beta Was this translation helpful? Give feedback.
0 replies
-
check first line of https://docs.vllm.ai/en/latest/getting_started/cpu-installation.html# |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to run small quantized model on my notebook. As I understood to run on CPU building from source required. I have follwed instructions on https://docs.vllm.ai/en/latest/getting_started/cpu-installation.html. After running
VLLM_TARGET_DEVICE=cpu python setup.py install
no binary found in build folder only 3 other folders:How to build vllm from source to get binary file? OS: Ubuntu 24
Beta Was this translation helpful? Give feedback.
All reactions