Skip to content

DengNingyuan/RLHF_MT_Reward

Repository files navigation

RLHF_MT_Reward

This code implements COMET as a reward model to capture human preferences in machine translation (MT) and in Proximal Policy Optimization (PPO).

本项目基于 Miraclemarvel55/ChatGLM-RLHF 的工作,旨在进一步扩展和改进该项目的功能。

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages