@nikg4 @taenin what other functions should we put into the GrpoRewards struct? other rewards functions? Here is a first cut from sonnet
For a grpo_rewards package written in Rust and wrapped for Python, I'd suggest focusing on these key functions:
- Reward Calculation Functions:
calculate_reward(actions, state) - Core function to compute rewards based on agent actions and environment state
discount_rewards(rewards, gamma) - Apply temporal discounting to reward sequences
normalize_rewards(rewards) - Standardize rewards for stable training
Implement discount_rewards, normalize_rewards
Originally posted by @kyjohnso in #13