Skip to content

Relaxed-System-Lab/multi-actor-data-selection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining

This is the repo for the paper Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining. We will release the model checkpoints, datasets and the code within next few weeks. Illustration of multi-agent collaborative framework

Updates

Release plan

TODOs:

  • Model Checkpoints
  • BERT Topic Model Checkpoint
  • Labeled Slimpajama-670B datasets
  • Code for baselines and methods - will be released after acceptance
  • Summarize data efficient pretraining methods ......

About

This is the repo for the paper Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published