Replies: 1 comment
-
That error was resolved by adding ray.init(
include_dashboard=True,
dashboard_host="0.0.0.0",
)
# ...
args = AsyncEngineArgs(
model="/your/model/path",
engine_use_ray=True,
worker_use_ray=True, # added this
gpu_memory_utilization=0.5,
) As such, while the errors given in my first post are resolved, I'm still not sure if this is the proper way to go about this and would appreciate some feedback and possibly some links to functions in the source that may be of interest. For example, if nodes with GPUs are added to the cluster, do I really need to re-initialize the engine with an updated Other than those questions, I think this can be closed. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Your current environment
How would you like to use vllm
I would like to pre-initialize my Ray cluster and have
AsyncLLMEngine
use it. I had thought that this would be as simple as passingengine_use_ray
after creating my cluster, however something is going on after vLLM callsray.init
after I already have.I end up getting messages from Ray along the lines of:
Here's how I'm trying to use it:
I'm running this in a local python notebook and would recommend you do the same when attempting to re-create it. You should see The error messages I saw after the Ray init codeblock. The first one Ray sent was +/- 6s after calling
AsyncLLMEngine.from_engine_args()
.I know that doing this must be possible, but I lost the notebook that I did it in. As a side-note, it may be nice to allow users to pass Ray options to engine initialization if it turns out that using an existing Ray cluster is more difficult than it seems.
Beta Was this translation helpful? Give feedback.
All reactions