Single-node 8xH100 examples for 32B models for long-context training (near 32k tokens regime)

Hi!

Are there any examples of realistic RL-tuning of 32B model with large model len (~32k with `ignore_eos=True` to ensure it can run near the maximum-length responses)?

I found https://github.com/modelscope/Trinity-RFT/blob/919a5a24a69cedf3596b70a70b6b20451e4c70a7/examples/dapo_math/dapo.yaml#L5

but it's suspiciously uses tp=1 sp=1. Does 32B training fits a single node in this experiment?

Or what is the actual number of response tokens in this experiment?

What would be your advice on setting tp and sp for 32B models? (both for single-node and for multi-node) Should we try sp=2, tp=2, tp=2xsp=2 configs?

Or should we just try sp=8 first?

Thanks!

---
Sorry, got wrong the label. This should not be marked as a bug

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Single-node 8xH100 examples for 32B models for long-context training (near 32k tokens regime) #279

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Single-node 8xH100 examples for 32B models for long-context training (near 32k tokens regime) #279

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions