Skip to content

Conversation

@pluesclues
Copy link
Collaborator

@pluesclues pluesclues commented Sep 5, 2025

Fixes the grpo nan issues we have been having with ga steps > 1, tested on h100 and collab on T4. This PR was created mainly to avoid passing a SPDA attention mask so it would not eat up a lot of memory. Relies on unslothai/unsloth-zoo#265.

# Selective log softmax
selective_log_softmax_code = inspect.getsource(selective_log_softmax)

#GRPO masking code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#GRPO masking code
# GRPO masking code


# The new lines you want to insert
replacement_lines = """batch_size = self.args.per_device_train_batch_size if mode == "train" else self.args.per_device_eval_batch_size
prompt_completion_ids = left_pack_padding(prompt_completion_ids, self.processing_class.pad_token_id)"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe newline?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to easierly resolve merge conflicts when testing with the Fast VLM infernece branch I moved everything in this PR to: #3132

https://github.com/pluesclues/unsloth/blob/fb115fb16cb2592caf99a9414b7d1f95f1f819ca/unsloth/models/rl_replacements.py#L252-L256

Copy link
Contributor

@danielhanchen danielhanchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work

@pluesclues pluesclues closed this Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants