Fast model loading #209

lionelvillard · 2025-05-22T16:59:39Z

lionelvillard
May 22, 2025

I'd like to start the discussion on adding fast model loading capability to llm-d. model service seems the right starting point.

The goal is to ensure that safetensors model files are close to (ie. on the worker nodes) where vllm instances are created. When worker nodes have NVMe local storage, these files should be stored on it in order to enable direct transfer from storage to GPU.

GIE defines the concept of InferenceModel and whatever solutions we come up with should play nicely with this concept.

@sriumcp what's your opinion?

/cc @fabolive @manoelmarques @aavarghese

sriumcp · 2025-05-22T17:12:52Z

sriumcp
May 22, 2025
Maintainer

Yes, model caching (cluster and node level) are both in-scope and part of the roadmap.

Lets evolve a design that plays well with rest of llm-d.

0 replies

mgoin · 2025-05-22T17:26:17Z

mgoin
May 22, 2025

vLLM actually has a few different extensions for model loaders. One of them is the Run:ai Model Streamer. It uses multiple threads to read tensors concurrently from a file in some file or object storage to a dedicated buffer in the CPU memory.

0 replies

lionelvillard · 2025-05-22T18:01:13Z

lionelvillard
May 22, 2025
Author

A lesser known model loader extension is fastsafetensor that can be used when NVMes are available.

0 replies

lionelvillard · 2025-05-22T18:01:38Z

lionelvillard
May 22, 2025
Author

/cc @jimcadden @paulcastro

3 replies

jeremyeder May 23, 2025
Maintainer

@chcost mind pinging the coreweave team to work with us on this?

wbrown May 23, 2025

@lionelvillard

@chcost mind pinging the coreweave team to work with us on this?

My recommendation would be https://github.com/coreweave/tensorizer

We just merged into vllm v1 main an update to make tensorizer work again with the new version of vLLM. It's integral to vllm.

wbrown May 23, 2025

PR that was merged in: vllm-project/vllm#17926

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fast model loading #209

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Fast model loading #209

Uh oh!

lionelvillard May 22, 2025

Replies: 4 comments · 3 replies

Uh oh!

sriumcp May 22, 2025 Maintainer

Uh oh!

mgoin May 22, 2025

Uh oh!

lionelvillard May 22, 2025 Author

Uh oh!

lionelvillard May 22, 2025 Author

Uh oh!

jeremyeder May 23, 2025 Maintainer

Uh oh!

Uh oh!

wbrown May 23, 2025

Uh oh!

wbrown May 23, 2025

lionelvillard
May 22, 2025

Replies: 4 comments 3 replies

sriumcp
May 22, 2025
Maintainer

mgoin
May 22, 2025

lionelvillard
May 22, 2025
Author

lionelvillard
May 22, 2025
Author

jeremyeder May 23, 2025
Maintainer