@nikg4 @taenin what other functions should we put into the GrpoRewards struct? other rewards functions? Here is a first cut from sonnet
- Policy Evaluation Functions:
evaluate_policy(policy, environment) - Measure performance of a given policy
compute_advantage(values, rewards) - Calculate advantage estimates
compute_returns(rewards, gamma) - Calculate cumulative returns
Implement evaluate_policy, compute_advantage, compute_returns functions in rust
Originally posted by @kyjohnso in #13