Skip to content

Minor DPO fixes #617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 16, 2025
Merged

Minor DPO fixes #617

merged 2 commits into from
Apr 16, 2025

Conversation

casinca
Copy link
Contributor

@casinca casinca commented Apr 13, 2025

Sensei, this PR concerns some minor potential fixes:

  • Removed unused label_smoothing argument in the docstring of def compute_dpo_loss.
    I believe you initially experimented with conservative DPO but there was a remnant left in the docstring.

    Alternatively we could add it back initialized label_smoothing=0, for the original DPO behavior.
    Then change the:
    losses = -F.logsigmoid(beta * logits)
    to
    losses = -F.logsigmoid(beta * logits) * (1 - label_smoothing) - F.logsigmoid(-beta * logits) * label_smoothing

    reference# Eq. 3 https://ericmitchell.ai/cdpo.pdf

  • Potential typo depending on the intent in def compute_logprobs().
    avg_log_prob resulting shape would be (batch_size,) and not (batch_size, num_tokens) concerning the comment "This averages over the tokens, so the shape is", ie the avg log proba per sequence.

    If you meant the initial shape before averaging then please ignore it.

  • Simplified def train_model_dpo_simple()'s to a simple for loop. Enumerate() was used but indexes were never used.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@rasbt rasbt merged commit 1b242d0 into rasbt:main Apr 16, 2025
13 checks passed
@rasbt
Copy link
Owner

rasbt commented Apr 16, 2025

This looks good, thanks for the PR! The label smoothing was a leftover I forgot to remove when I simplified the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants