Possible bug in tutorial

Hi,

in this tutorial https://huggingface.co/learn/cookbook/fine_tuning_vlm_mpo you show how to perform MPO/DPO with multimodal data with Qwen model. Anyway, looking at Qwen model it seems that it does not rely just on "pixel_values" for computation but also on "image_grid_thw" so I believe the implementation of DPO is lacking the forwarding of all the processed keys necessary for right computation. Anyway, I could be wrong and missing some steps, can you clarify? 

Thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible bug in tutorial #4071

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible bug in tutorial #4071

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions