How can I integrate external models (e.g., SentenceTransformer) into reward_funcs.py without DeepSpeed conflicts?

Hi,
I noticed that in your GRPO finetuning structure, all reward functions are defined inside src/train/reward_funcs.py, and the training script automatically loads any function ending with _reward.

I’d like to add a semantic reward function that uses an external embedding model such as SentenceTransformer (e.g., intfloat/e5-base) to compute cosine similarity between the model’s answer and a reference answer.

However, when I try to load SentenceTransformer inside that _reward function during GRPO training (with DeepSpeed ZeRO-3), I get this runtime error:

[reward] encode failed: 'weight' must be 2-D

My question

What is the recommended way to use such external models (SentenceTransformer/CLIP/etc.) as reward functions in your GRPO setup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How can I integrate external models (e.g., SentenceTransformer) into reward_funcs.py without DeepSpeed conflicts? #212

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How can I integrate external models (e.g., SentenceTransformer) into reward_funcs.py without DeepSpeed conflicts? #212

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions