Post-train DeepSeek V3/R1 with DPO using just a few GPU nodes? #58
sfc-gh-sbekman
started this conversation in
Polls
Replies: 2 comments 2 replies
-
This would be incredibly helpful. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks a ton for those who voted for this project! So I will start working on it once Ulysses sequence parallelism integration has been finished and this PR is merged: #45 We hope that those who are interested would like to collaborate on this work. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello AI Community!
We are pondering over the features we can bring to ArcticTraining in the near future that would offer value to the AI community. One such feature we are considering is the ability to post-train DeepSeek V3 or DeepSeek R1 model with DPO using just a few GPU nodes.
Our upper-bound estimate to post-train 500M tokens with DPO is around 2-3 days on 8x H100 nodes (64x H100 GPUs).
We would like to ask you for your feedback and if you will find this feature valuable, and if you would use it if we were to build it out.
It would be incredibly helpful if you could answer the poll and tell others about it.
If there are some other features that you would like us to support, please feel free to share as well in the comments below.
We are looking forward to hearing from you.
p.s. if you didn't know, ArcticTraining is an open-source, easy to use post-training framework for NVIDIA GPUs built on top of DeepSpeed.
Best,
Snowflake AI Research
24 votes ·
Beta Was this translation helpful? Give feedback.
All reactions