Running Inference on Multiple GPU

Im trying to run the Deepseek-vl2 on 2 A6000 GPUs (48GB). When I modify the model loading from,

```
  vl_gpt: DeepseekVLV2ForCausalLM = AutoModelForCausalLM.from_pretrained(
      model_path,
      trust_remote_code=True,
      torch_dtype=dtype,
  )
```

to,
```
  vl_gpt: DeepseekVLV2ForCausalLM = AutoModelForCausalLM.from_pretrained(
      model_path,
      trust_remote_code=True,
      torch_dtype=dtype,
      device_map="auto"
  )
```

Its throwing error that tensors should be in the same device; 
```
  File "DeepSeek-VL2/deepseek_vl2/models/modeling_deepseek.py", line 108, in forward
    return self.weight * hidden_states.to(input_dtype)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
```

Please let me know if there is a way to run the multigpu inference. I have already tried the method split_model function in an earlier issue, but still didnt work.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running Inference on Multiple GPU #132

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running Inference on Multiple GPU #132

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions