generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Open
Labels
❓ questionSeeking clarification or more informationSeeking clarification or more information🐛 bugSomething isn't workingSomething isn't working📚 documentationImprovements or additions to documentationImprovements or additions to documentation
Description
Hi,
in this tutorial https://huggingface.co/learn/cookbook/fine_tuning_vlm_mpo you show how to perform MPO/DPO with multimodal data with Qwen model. Anyway, looking at Qwen model it seems that it does not rely just on "pixel_values" for computation but also on "image_grid_thw" so I believe the implementation of DPO is lacking the forwarding of all the processed keys necessary for right computation. Anyway, I could be wrong and missing some steps, can you clarify?
Thank you
Metadata
Metadata
Assignees
Labels
❓ questionSeeking clarification or more informationSeeking clarification or more information🐛 bugSomething isn't workingSomething isn't working📚 documentationImprovements or additions to documentationImprovements or additions to documentation