Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Sensei, this PR concerns some minor potential fixes:
Removed unused
label_smoothing
argument in the docstring ofdef compute_dpo_loss
.I believe you initially experimented with conservative DPO but there was a remnant left in the docstring.
Alternatively we could add it back initialized
label_smoothing=0
, for the original DPO behavior.Then change the:
losses = -F.logsigmoid(beta * logits)
to
losses = -F.logsigmoid(beta * logits) * (1 - label_smoothing) - F.logsigmoid(-beta * logits) * label_smoothing
reference
# Eq. 3 https://ericmitchell.ai/cdpo.pdf
Potential typo depending on the intent in
def compute_logprobs()
.avg_log_prob
resulting shape would be(batch_size,)
and not(batch_size, num_tokens)
concerning the comment "This averages over the tokens, so the shape is", ie the avg log proba per sequence.If you meant the initial shape before averaging then please ignore it.
Simplified
def train_model_dpo_simple()
's to a simple for loop.Enumerate()
was used but indexes were never used.