Skip to content

Running Inference on Multiple GPU #132

@Mukil07

Description

@Mukil07

Im trying to run the Deepseek-vl2 on 2 A6000 GPUs (48GB). When I modify the model loading from,

  vl_gpt: DeepseekVLV2ForCausalLM = AutoModelForCausalLM.from_pretrained(
      model_path,
      trust_remote_code=True,
      torch_dtype=dtype,
  )

to,

  vl_gpt: DeepseekVLV2ForCausalLM = AutoModelForCausalLM.from_pretrained(
      model_path,
      trust_remote_code=True,
      torch_dtype=dtype,
      device_map="auto"
  )

Its throwing error that tensors should be in the same device;

  File "DeepSeek-VL2/deepseek_vl2/models/modeling_deepseek.py", line 108, in forward
    return self.weight * hidden_states.to(input_dtype)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

Please let me know if there is a way to run the multigpu inference. I have already tried the method split_model function in an earlier issue, but still didnt work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions