Phi 4 Multimodal not working with BnB/4bit quantization

### System Info

RTX 3090, Driver 561.09. 
Win 11
Python 3.12
Pytorch 2.6.0 +Cu124 (Cuda 12.4)
Transformers 4.51.1
BNB 0.45.5

### Reproduction

Running into some trouble with quantizing Phi 4 multimodal with BnB. The code to reproduce the error is: 

```
`model_path = <path_to_phi4_multimodal_from_HF_here>

nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)

processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    model_path, 
    trust_remote_code=True,
    device_map='cuda',
    torch_dtype = torch.bfloat16,
    # if you do not use Ampere or later GPUs, change attention to "eager"
    _attn_implementation='flash_attention_2',
    quantization_config = nf4_config
)

generation_config = GenerationConfig.from_pretrained(model_path, 'generation_config.json')

user_message = <user_prompt_here>

generate_ids = model.generate(
            **inputs,
            max_new_tokens=2000,
            generation_config=generation_config,
            num_logits_to_keep=1,
            num_beams=1 )`
```

Gives the following error:

```
` File "cache\huggingface\modules\transformers_modules\Phi-4-multimodal-instruct\modeling_phi4mm.py", line 1987, in set_lora_adapter
    module.set_adapter(adapter_name)
  File "ache\huggingface\modules\transformers_modules\Phi-4-multimodal-instruct\modeling_phi4mm.py", line 2107, in forward
    self.set_lora_adapter('speech')
  File "phi4.py", line 91, in <module>
    **inputs,

            max_new_tokens=2000,

            generation_config=generation_config,

            num_logits_to_keep=1,

            num_beams=1 )

RuntimeError: only Tensors of floating point dtype can require gradients `
```

When `quantization_config` is removed from `from_pretrained`, the code works. The above code also works for the non multimodal variants of Phi-4. I wonder if the problem lies with how BnB interacts with the adapters that came with the multimodal model. 

Thanks in advance for any guidance. 

### Expected behavior

This to work like the non multimodal variants of Phi 4. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phi 4 Multimodal not working with BnB/4bit quantization #1600

System Info

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Phi 4 Multimodal not working with BnB/4bit quantization #1600

Description

System Info

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions