Skip to content

Multi-node models fail inference #142

@francois-rd

Description

@francois-rd

Describe the bug

Multi-node models (for example, Mistral-Large-Instruct-2411 or Meta-Llama-3.1-405B-Instruct with default configs) successfully load when launching. Specifically, everything looks normal with vec-inf launch and vec-inf status even shows the server as READY and gives a base URL. However, the server hangs with (APIServer pid=13) RuntimeError: There is no current event loop in thread 'MPClientEngineMonitor' as soon as an inference request is made by a client.

To Reproduce

vec-inf launch Mistral-Large-Instruct-2411
# ... wait for server to launch ...
vec-inf status $JOB_ID
# -> will eventually show model status as READY and give a base URL
# Conducting inference with a client (example, by strictly following: https://github.com/VectorInstitute/vector-inference/blob/main/examples/inference/llm/chat_completions.py) will lead to model hanging
vec-inf status $JOB_ID
# -> now shows model status as FAILED and error as: (APIServer pid=13) RuntimeError: There is no current event loop in thread 'MPClientEngineMonitor'

Expected behavior

Server should not hang. Inference should proceed as normal.

Screenshots

N/A

Version

  • v0.7 (on Killarney)

Additional context

N/A

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions