How to dispatch LLM across GPUs like `device_map="auto"` of transformers.AutoModelForCausalLM #6372

sjlee-me · 2024-07-12T13:38:11Z

sjlee-me
Jul 12, 2024

I am curious about how to dispatch a large language model (LLM) into smaller pieces across GPUs using the vllm library.

For example, in the transformers library, adding device_map="auto" to the AutoModelForCausalLM.from_pretrained function allows the LLM to be split and loaded across multiple GPUs like this:

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    quantization_config=bnb_config,
)

Does vllm have a similar feature? What parameters should I add to the following code to enable dispatching the LLM across GPUs?

from vllm import LLM
model = LLM(model="neuralmagic/Meta-Llama-3-70B-Instruct-quantized.w4a16")

When I use model = LLM(model="neuralmagic/Meta-Llama-3-70B-Instruct-quantized.w4a16", tensor_parallel_size=8), adding "tensor_parallel_size=8"
I can see logs like "(VllmWorkerProcess pid=364103) INFO 07-12 22:51:00 model_runner.py:255] Loading model weights took 4.9631 GB".

However, after a few seconds, each GPU actually uses too many memory. (46974 / 49140 MB)

Thank you for your help!

sjlee-me · 2024-07-13T04:55:57Z

sjlee-me
Jul 13, 2024
Author

[Self response]
llm = LLM(model=model_id, tensor_parallel_size=4, gpu_memory_utilization=0.5) solved this issue.
solution from #550

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to dispatch LLM across GPUs like `device_map="auto"` of transformers.AutoModelForCausalLM #6372

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to dispatch LLM across GPUs like device_map="auto" of transformers.AutoModelForCausalLM #6372

Uh oh!

Uh oh!

sjlee-me Jul 12, 2024

Replies: 1 comment

Uh oh!

sjlee-me Jul 13, 2024 Author

How to dispatch LLM across GPUs like `device_map="auto"` of transformers.AutoModelForCausalLM #6372

sjlee-me
Jul 12, 2024

sjlee-me
Jul 13, 2024
Author