RLHF_MT_Reward

This code implements COMET as a reward model to capture human preferences in machine translation (MT) and in Proximal Policy Optimization (PPO).

本项目基于 Miraclemarvel55/ChatGLM-RLHF 的工作，旨在进一步扩展和改进该项目的功能。

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
chatglm3_comet_zhbert_finetune.py		chatglm3_comet_zhbert_finetune.py
inf_chatglm3_lora.py		inf_chatglm3_lora.py
models3_rlhf.py		models3_rlhf.py
utils_chatglm3.py		utils_chatglm3.py

Provide feedback