Multi-node models fail inference

### Describe the bug
Multi-node models (for example, `Mistral-Large-Instruct-2411` or `Meta-Llama-3.1-405B-Instruct` with default configs) successfully load when launching. Specifically, everything looks normal with `vec-inf launch` and `vec-inf status` even shows the server as READY and gives a base URL. However, the server hangs with `(APIServer pid=13) RuntimeError: There is no current event loop in thread 'MPClientEngineMonitor'` as soon as an inference request is made by a client.

### To Reproduce
```bash
vec-inf launch Mistral-Large-Instruct-2411
# ... wait for server to launch ...
vec-inf status $JOB_ID
# -> will eventually show model status as READY and give a base URL
# Conducting inference with a client (example, by strictly following: https://github.com/VectorInstitute/vector-inference/blob/main/examples/inference/llm/chat_completions.py) will lead to model hanging
vec-inf status $JOB_ID
# -> now shows model status as FAILED and error as: (APIServer pid=13) RuntimeError: There is no current event loop in thread 'MPClientEngineMonitor'
```

### Expected behavior
Server should not hang. Inference should proceed as normal.

### Screenshots
N/A

### Version
 - v0.7 (on Killarney)

### Additional context
N/A


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-node models fail inference #142

Describe the bug

To Reproduce

Expected behavior

Screenshots

Version

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-node models fail inference #142

Description

Describe the bug

To Reproduce

Expected behavior

Screenshots

Version

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions