[float8 moe training] validate float8 moe parallelism config (#1360)

danielvegamyhre · web-flow · commit c08c9d4962ea · 2025-07-01T17:34:56.000-07:00
## Summary
Validate only FSDP, HSDP are used for float8 MoE training. TP support is
in progress and CP/PP are untested. 2D+ parallelism are untested as
well.

## Test plan
- Command: `NGPU=4
CONFIG_FILE="./torchtitan/experiments/llama4/train_configs/debug_model.toml"
./run_train.sh --training.steps=10 --model.converters="float8"
--float8.recipe_name="rowwise" --float8.moe_fqns_prototype="experts"
--parallelism.tensor_parallel_degree=2`
- Error: `AssertionError: Float8 MoE training prototype does not yet
support tensor parallelism`
diff --git a/torchtitan/components/quantization/float8.py b/torchtitan/components/quantization/float8.py
@@ -57,6 +57,18 @@ def __init__(self, job_config: JobConfig, parallel_dims: ParallelDims):
         self.moe_fqns = float8_config.moe_fqns_prototype
         self.filter_fn = self._init_filter_fn(float8_config)
 
+        # Validate MoE training prototype limitations.
+        if self.moe_fqns:
+            assert (
+                job_config.parallelism.tensor_parallel_degree == 1
+            ), "Float8 MoE training prototype does not yet support tensor parallelism"
+            assert (
+                job_config.parallelism.pipeline_parallel_degree == 1
+            ), "Float8 MoE training prototype does not yet support pipeline parallelism"
+            assert (
+                job_config.parallelism.context_parallel_degree == 1
+            ), "Float8 MoE training prototype does not yet support context parallelism"
+
         if float8_config.recipe_name is not None:
             assert not float8_config.enable_fsdp_float8_all_gather, (
                 "using `float8_config.enable_fsdp_float8_all_gather` together "