Skip to content

Potential Bug in Attention Mask Implementation for Listwise Ranking in M-Falcon #244

@foreverYoungGitHub

Description

@foreverYoungGitHub

Thank you for sharing the DLRM implementation which has significantly clarified the M-Falcon methodology mentioned in the paper ❤

Understanding of M-Falcon's Attention Mask

From my understanding, M-Falcon utilizes the attention mask to control the visibility of historical items for multiple targets, ensuring efficient training and inference. For example, in a decoder-only approach with a sequence length of 4, the attention mask would look like:

T, F, F, F
T, T, F, F
T, T, T, F
T, T, T, T

With M-Falcon applied to a pairwise ranking task, using a sequence length of 4, 2 interaction histories, and 2 targets, the attention mask is as follows:

T, F, F, F
T, T, F, F
T, T, T, F
T, T, F, T

Issue with Listwise Ranking Attention Mask

However, in the context of a listwise ranking task, I would expect the attention mask with a sequence length of 4, 2 interaction histories, and 2 targets to be:

T, F, F, F
T, T, F, F
T, T, T, T
T, T, T, T

This configuration allows all target items to see each other, which is essential for effective listwise ranking.

Observed Behavior in Current Implementation

In the current implementation of DLRM, it appears that the default attention mask is being used instead of the expected M-Falcon attention mask for listwise ranking. This default mask restricts high-score retrieval items from seeing low-score items, which might inadvertently affect the performance of the ranking task.

Inquiry

Is there a specific reason why the default attention mask is used for listwise ranking instead of the M-Falcon-designed mask that allows all target items to see each other? If this is unintended behavior, I wanted to bring it to your attention in case it affects the performance of listwise ranking tasks.

Thank you once again for your excellent work and for providing such a valuable resource to the community!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions