Fix grpo nan #3278

pluesclues · 2025-09-05T19:50:15Z

Fixes the grpo nan issues we have been having with ga steps > 1, tested on h100 and collab on T4. This PR was created mainly to avoid passing a SPDA attention mask so it would not eat up a lot of memory. Relies on unslothai/unsloth-zoo#265.

… trainers

…htlyt

…ard modeling

danielhanchen · 2025-09-09T06:59:07Z

unsloth/models/rl.py

    # Selective log softmax
    selective_log_softmax_code = inspect.getsource(selective_log_softmax)

+    #GRPO masking code


Suggested change

#GRPO masking code

# GRPO masking code

unsloth/models/rl_replacements.py

danielhanchen · 2025-09-09T07:00:18Z

unsloth/models/rl_replacements.py

+
+    # The new lines you want to insert
+    replacement_lines = """batch_size = self.args.per_device_train_batch_size if mode == "train" else self.args.per_device_eval_batch_size
+        prompt_completion_ids = left_pack_padding(prompt_completion_ids, self.processing_class.pad_token_id)"""


Maybe newline?

In order to easierly resolve merge conflicts when testing with the Fast VLM infernece branch I moved everything in this PR to: #3132

https://github.com/pluesclues/unsloth/blob/fb115fb16cb2592caf99a9414b7d1f95f1f819ca/unsloth/models/rl_replacements.py#L252-L256

danielhanchen

Nice work

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

pluesclues added 26 commits June 22, 2025 20:59

Kept, padding logic

f911c32

Made sure prediction step in rl.py allows logging for callbacks in RL…

2ba7f50

… trainers

Merge branch 'unslothai:main' into main

0c1bc4d

updated llama.py to new online_dpo changes

78336ce

Update rl.py to make logic simpiler

383aa9c

Update rl.py, made sure tokenized_output on eval step was on same device

532af4f

Update rl.py, corrected tokenized_outputs to inputs

49f77c1

Update rl.py, removed sagemaker stuff

7921aa7

Update llama.py, figures out if there is right padding automatically

54f03ee

Update llama.py, changed conditional statement for right padding slig…

a8d4168

…htlyt

Update llama.py, updated OS.environ variable to temp variable

236b924

Merge branch 'main' into main

76d73c6

Update rl.py, made it account for right padding in online dpo and rew…

fa2e18e

…ard modeling

Update llama.py, automatically figures out if right padding is needed

80f9cd2

Merge branch 'main' into main

ed1771a

Merge branch 'main' into main

49d3844

Merge branch 'unslothai:main' into main

b0a9c65

Merge branch 'unslothai:main' into main

6edcb0d

Merge branch 'unslothai:main' into main

90c581b

Merge branch 'unslothai:main' into fix_grpo_nan

0d2b9dc

Update rl_replacements.py

5df4532

Update rl.py

eb65ecf

Update rl.py, chagned order of util functions for padding

4751abf

Update rl_replacements.py, disabled commenting out logits_to_keep

d86953b

Update llama.py

190c2c0

Merge branch 'unslothai:main' into fix_grpo_nan

0b9068c

danielhanchen reviewed Sep 9, 2025

View reviewed changes

unsloth/models/rl_replacements.py Outdated Show resolved Hide resolved

danielhanchen reviewed Sep 9, 2025

View reviewed changes

danielhanchen requested changes Sep 9, 2025

View reviewed changes

pluesclues and others added 5 commits September 9, 2025 12:37

Update unsloth/models/rl.py

1fba36c

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

Merge branch 'unslothai:main' into fix_grpo_nan

fa48726

Update rl_replacements.py

fad14ca

Update rl_replacements.py, added new line

6aedc2f

Update rl_replacements.py, updated version

4bcd41e

pluesclues closed this Sep 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix grpo nan #3278

Fix grpo nan #3278

pluesclues commented Sep 5, 2025 •

edited

Loading

Uh oh!

danielhanchen Sep 9, 2025

Uh oh!

Uh oh!

danielhanchen Sep 9, 2025

Uh oh!

pluesclues Sep 9, 2025

Uh oh!

danielhanchen left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Fix grpo nan #3278

Fix grpo nan #3278

Conversation

pluesclues commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielhanchen Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danielhanchen Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

pluesclues Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

danielhanchen left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pluesclues commented Sep 5, 2025 •

edited

Loading