You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi.
The paper mentions that the offline vanilla DPO is trained on the Nectar dataset. I have several questions about that.
How do you process the Nectar dataset? Nectar is the 7-wise comparison dataset which results in a total of 3.8M pairwise data. Do you use the whole dataset?