can not run multi-GPU using ray #360

renfeier announced in Q&A

renfeier
Jul 4, 2023

I attempted multi-GPU inference (8 GPU inference on A100) on Llama-13B. I first running $ ray start --head , ray start --address='30.152.83.253:6379 and then modify offline_inference.py code with llm = LLM(model="./models/open_llama_13b", tensor_parallel_size=4), and then run python3 offline_inference.py
i got the flowing output:

INFO worker.py:1452 -- Connecting to existing Ray cluster at address: 30.152.83.253:6379...
INFO worker.py:1636 -- Connected to Ray cluster.

then it blocked , no more log output, how to judge my program is running success? i can not find the generated output result.

Replies: 1 comment

renfeier
Jul 14, 2023
Author

@WoosukKwon can someone give me suggestions to run vllm with multi-gpu ,thanks?

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment