You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
At some point torchtitan + delayed scaling + compile broke, fixing by
switching to functional collectives for amax all-reduce.
It would actually be great to add a local repro, will follow up offline
on what could be missing in our current test coverage.
Test Plan:
```
// torchtitan run which is fixed by this PR
with-proxy CONFIG_FILE="./train_configs/debug_model.toml" ./run_llama_train.sh --float8.enable_float8_linear --training.compile --float8.scaling_type_input delayed --float8.scaling_type_weight delayed --float8.scaling_type_grad_output delayed
// error message without this PR:
https://gist.github.com/vkuzo/dbf54cf4027fd49bfb8095d518c618af
```
Reviewers:
Subscribers:
Tasks:
Tags:
0 commit comments