You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to run vanilla DPO (offline DPO) as the baseline to compare its performance with online DPO. May I ask whether I can use this codebase to tun the experiment and what is the running command. Thank you very much in advance.