Replies: 1 comment
-
@WoosukKwon can someone give me suggestions to run vllm with multi-gpu ,thanks? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I attempted multi-GPU inference (8 GPU inference on A100) on Llama-13B. I first running $ ray start --head , ray start --address='30.152.83.253:6379 and then modify offline_inference.py code with llm = LLM(model="./models/open_llama_13b", tensor_parallel_size=4), and then run python3 offline_inference.py
i got the flowing output:
INFO worker.py:1452 -- Connecting to existing Ray cluster at address: 30.152.83.253:6379...
INFO worker.py:1636 -- Connected to Ray cluster.
then it blocked , no more log output, how to judge my program is running success? i can not find the generated output result.
Beta Was this translation helpful? Give feedback.
All reactions