Skip to content

Possible bug in tutorial #4071

@GabrieleGiudic

Description

@GabrieleGiudic

Hi,

in this tutorial https://huggingface.co/learn/cookbook/fine_tuning_vlm_mpo you show how to perform MPO/DPO with multimodal data with Qwen model. Anyway, looking at Qwen model it seems that it does not rely just on "pixel_values" for computation but also on "image_grid_thw" so I believe the implementation of DPO is lacking the forwarding of all the processed keys necessary for right computation. Anyway, I could be wrong and missing some steps, can you clarify?

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    ❓ questionSeeking clarification or more information🐛 bugSomething isn't working📚 documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions