Llama4 training does not automatically use bfloat16 when FSDP2 is enabled

### Bug description

For Llama3, when using FSDP2 the model weights are automatically converted to bfloat16 for mixed precision training. However, I notice with Llama4, when using FSDP2 I have to manually cast the weights to bfloat16 [before this line](https://github.com/pytorch/torchtitan/blob/bc5ebb72413125c81a864a6b7b224d10ed378a74/torchtitan/train.py#L260) otherwise they stay in fp32 and I can't use float8 GEMMs.

### Versions

torchtitan latest main branch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama4 training does not automatically use bfloat16 when FSDP2 is enabled #1332

Bug description

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Llama4 training does not automatically use bfloat16 when FSDP2 is enabled #1332

Description

Bug description

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions