generated from VectorInstitute/aieng-template-poetry
-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
Describe the bug
Multi-node models (for example, Mistral-Large-Instruct-2411
or Meta-Llama-3.1-405B-Instruct
with default configs) successfully load when launching. Specifically, everything looks normal with vec-inf launch
and vec-inf status
even shows the server as READY and gives a base URL. However, the server hangs with (APIServer pid=13) RuntimeError: There is no current event loop in thread 'MPClientEngineMonitor'
as soon as an inference request is made by a client.
To Reproduce
vec-inf launch Mistral-Large-Instruct-2411
# ... wait for server to launch ...
vec-inf status $JOB_ID
# -> will eventually show model status as READY and give a base URL
# Conducting inference with a client (example, by strictly following: https://github.com/VectorInstitute/vector-inference/blob/main/examples/inference/llm/chat_completions.py) will lead to model hanging
vec-inf status $JOB_ID
# -> now shows model status as FAILED and error as: (APIServer pid=13) RuntimeError: There is no current event loop in thread 'MPClientEngineMonitor'
Expected behavior
Server should not hang. Inference should proceed as normal.
Screenshots
N/A
Version
- v0.7 (on Killarney)
Additional context
N/A
Metadata
Metadata
Assignees
Labels
No labels