Questions About Data chosen Strategies

Hi, amazing work, and thank you for making it open source!

1. After reviewing your code, I noticed multiple preference strategies are included when selecting DPO preference pairs. Have you compared these strategies, and if so, which one tends to perform better?

2. When incorporating chosen preference data (SFT) into the original model, if the data distribution of the original model's outputs is completely inconsistent with the chosen data and of lower quality, would you recommend using OOD chosen + generated data as preference pairs for training, or only using preference pairs generated by the original model?

Thanks in advance for your insights!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions About Data chosen Strategies #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions About Data chosen Strategies #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions