- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 3.9k
TRL Updated version of VLM GRPO update along with GSPO #3132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
    
  
     Merged
                    Changes from 37 commits
      Commits
    
    
            Show all changes
          
          
            39 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      f911c32
              
                Kept, padding logic
              
              
                pluesclues 2ba7f50
              
                Made sure prediction step in rl.py allows logging for callbacks in RL…
              
              
                pluesclues 0c1bc4d
              
                Merge branch 'unslothai:main' into main
              
              
                pluesclues 78336ce
              
                updated llama.py to new online_dpo changes
              
              
                pluesclues 383aa9c
              
                Update rl.py to make logic simpiler
              
              
                pluesclues 532af4f
              
                Update rl.py, made sure tokenized_output on eval step was on same device
              
              
                pluesclues 49f77c1
              
                Update rl.py, corrected tokenized_outputs to inputs
              
              
                pluesclues 7921aa7
              
                Update rl.py, removed sagemaker stuff
              
              
                pluesclues 54f03ee
              
                Update llama.py, figures out if there is right padding automatically
              
              
                pluesclues a8d4168
              
                Update llama.py, changed conditional statement for right padding slig…
              
              
                pluesclues 236b924
              
                Update llama.py, updated OS.environ variable to temp variable
              
              
                pluesclues 76d73c6
              
                Merge branch 'main' into main
              
              
                pluesclues fa2e18e
              
                Update rl.py, made it account for right padding in online dpo and rew…
              
              
                pluesclues 80f9cd2
              
                Update llama.py, automatically figures out if right padding is needed
              
              
                pluesclues ed1771a
              
                Merge branch 'main' into main
              
              
                pluesclues 49d3844
              
                Merge branch 'main' into main
              
              
                pluesclues b0a9c65
              
                Merge branch 'unslothai:main' into main
              
              
                pluesclues 30f3366
              
                Update rl_replacements.py, fixed up passing image data to functions
              
              
                pluesclues 327053f
              
                Merge branch 'unslothai:main' into vlm_grpo_update
              
              
                pluesclues 8af680f
              
                Update rl_replacements.py, for VLM GRPO support with TRL
              
              
                pluesclues 5e0fbdb
              
                Update rl_replacements.py, gspo added
              
              
                pluesclues ba4fc39
              
                Update rl.py, forgot about Online_DPO changes in this branch
              
              
                pluesclues f9a2c18
              
                Update rl.py, forgot to not include Online DPO PR changes
              
              
                pluesclues 36d3f97
              
                Update llama.py, forgot to disinclude Online DPO PR changes
              
              
                pluesclues 9c11967
              
                Merge branch 'unslothai:main' into vlm_grpo_update
              
              
                pluesclues 5a370a5
              
                Merge branch 'unslothai:main' into vlm_grpo_update
              
              
                pluesclues 7e97306
              
                Merge branch 'unslothai:main' into vlm_grpo_update
              
              
                pluesclues a78b407
              
                Merge branch 'unslothai:main' into vlm_grpo_update
              
              
                pluesclues 8266f9e
              
                Update rl_replacements.py, updated generate and score completions to …
              
              
                pluesclues b6fdf4d
              
                Merge branch 'unslothai:main' into vlm_grpo_update
              
              
                pluesclues 0ad04a6
              
                Merge branch 'unslothai:main' into vlm_grpo_update
              
              
                pluesclues 2b379ad
              
                Update rl_replacements.py
              
              
                pluesclues 0eaf2ec
              
                Update rl_replacements.py, fixed nan issues with vlms
              
              
                pluesclues fb115fb
              
                Update rl_replacements.py, added indent
              
              
                pluesclues 694f88e
              
                Update rl_replacements.py, added attention mask to calculations of ol…
              
              
                pluesclues 5b4a03d
              
                Merge branch 'unslothai:main' into vlm_grpo_update
              
              
                pluesclues 3e63ed2
              
                Merge branch 'unslothai:main' into vlm_grpo_update
              
              
                pluesclues b4b6c65
              
                Update unsloth/models/rl_replacements.py
              
              
                danielhanchen 24712ca
              
                Update unsloth/models/rl_replacements.py
              
              
                danielhanchen File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Uh oh!
There was an error while loading. Please reload this page.