Skip to content

Phi 4 Multimodal not working with BnB/4bit quantization #1600

Open
@palladium123

Description

@palladium123

System Info

RTX 3090, Driver 561.09.
Win 11
Python 3.12
Pytorch 2.6.0 +Cu124 (Cuda 12.4)
Transformers 4.51.1
BNB 0.45.5

Reproduction

Running into some trouble with quantizing Phi 4 multimodal with BnB. The code to reproduce the error is:

`model_path = <path_to_phi4_multimodal_from_HF_here>

nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)

processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    model_path, 
    trust_remote_code=True,
    device_map='cuda',
    torch_dtype = torch.bfloat16,
    # if you do not use Ampere or later GPUs, change attention to "eager"
    _attn_implementation='flash_attention_2',
    quantization_config = nf4_config
)

generation_config = GenerationConfig.from_pretrained(model_path, 'generation_config.json')

user_message = <user_prompt_here>

generate_ids = model.generate(
            **inputs,
            max_new_tokens=2000,
            generation_config=generation_config,
            num_logits_to_keep=1,
            num_beams=1 )`

Gives the following error:

` File "cache\huggingface\modules\transformers_modules\Phi-4-multimodal-instruct\modeling_phi4mm.py", line 1987, in set_lora_adapter
    module.set_adapter(adapter_name)
  File "ache\huggingface\modules\transformers_modules\Phi-4-multimodal-instruct\modeling_phi4mm.py", line 2107, in forward
    self.set_lora_adapter('speech')
  File "phi4.py", line 91, in <module>
    **inputs,

            max_new_tokens=2000,

            generation_config=generation_config,

            num_logits_to_keep=1,

            num_beams=1 )

RuntimeError: only Tensors of floating point dtype can require gradients `

When quantization_config is removed from from_pretrained, the code works. The above code also works for the non multimodal variants of Phi-4. I wonder if the problem lies with how BnB interacts with the adapters that came with the multimodal model.

Thanks in advance for any guidance.

Expected behavior

This to work like the non multimodal variants of Phi 4.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Model SupportRelated to a specific modeling situation.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions