Hi,
I noticed that in your GRPO finetuning structure, all reward functions are defined inside src/train/reward_funcs.py, and the training script automatically loads any function ending with _reward.
I’d like to add a semantic reward function that uses an external embedding model such as SentenceTransformer (e.g., intfloat/e5-base) to compute cosine similarity between the model’s answer and a reference answer.
However, when I try to load SentenceTransformer inside that _reward function during GRPO training (with DeepSpeed ZeRO-3), I get this runtime error:
[reward] encode failed: 'weight' must be 2-D
My question
What is the recommended way to use such external models (SentenceTransformer/CLIP/etc.) as reward functions in your GRPO setup