[Rust] GrpoRewards: Additional Reward Utility Functions

> @nikg4 @taenin what other functions should we put into the GrpoRewards struct? other rewards functions? Here is a first cut from sonnet
> 
> For a `grpo_rewards` package written in Rust and wrapped for Python, I'd suggest focusing on these key functions:
> 
> 1. **Reward Calculation Functions**:
>    - `calculate_reward(actions, state)` - Core function to compute rewards based on agent actions and environment state
>    - `discount_rewards(rewards, gamma)` - Apply temporal discounting to reward sequences
>    - `normalize_rewards(rewards)` - Standardize rewards for stable training
> 

Implement discount_rewards, normalize_rewards

 _Originally posted by @kyjohnso in [#13](https://github.com/oumi-ai/roumi/issues/13#issuecomment-2745276290)_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Rust] GrpoRewards: Additional Reward Utility Functions #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Rust] GrpoRewards: Additional Reward Utility Functions #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions