Skip to content

Implement GRPO #1271

@MischaPanch

Description

@MischaPanch

By now, GRPO (pioneered by deepseek) is widely used. There are probably several implementations and variations that one can have a look at for starting out. It's also a fairly simple algorithm.

The implementation should be analogous to other on-policy algorithms.
One should explore research papers on that topic and consider which kind of variations of GRPO we want to support from the start.
Performance should be tested and compared to PPO.
After an implementation and first tests with the low-level API, it should also be included in the high-level API. That can happen in a separate follow-up PR.

@arnaujc91 will take this issue

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions