Skip to content

[Rust] GrpoRewards: Additional Reward Utility Functions #16

@kyjohnso

Description

@kyjohnso

@nikg4 @taenin what other functions should we put into the GrpoRewards struct? other rewards functions? Here is a first cut from sonnet

For a grpo_rewards package written in Rust and wrapped for Python, I'd suggest focusing on these key functions:

  1. Reward Calculation Functions:
    • calculate_reward(actions, state) - Core function to compute rewards based on agent actions and environment state
    • discount_rewards(rewards, gamma) - Apply temporal discounting to reward sequences
    • normalize_rewards(rewards) - Standardize rewards for stable training

Implement discount_rewards, normalize_rewards

Originally posted by @kyjohnso in #13

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions