Add check for seq_len%tensor_parallel_degree==0 for parallelized Llama (#1312)

jc-audet · tianyu-l · H-Huang · commit 760c1f4d82ce · 2025-06-24T14:42:40.000-07:00
Mitigates #1306 Following discussions in #1306, the `seq_len%tensor_parallel_degree==0` seems to be a necessary condition for the tp Llama3 model to work (since it is a workaround of [this](pytorch/pytorch#130646) numerical issue in pytorch Dtensors of complex numbers. This PR makes this requirements explicit. --------- Co-authored-by: tianyu-l <150487191+tianyu-l@users.noreply.github.com>
diff --git a/torchtitan/models/llama3/infra/parallelize.py b/torchtitan/models/llama3/infra/parallelize.py
@@ -45,6 +45,17 @@ def parallelize_llama(
     NOTE: The passed-in model preferably should be on meta device. Otherwise,
     the model must fit on GPU or CPU memory.
     """
+    # TODO: TP currently cannot handle uneven seq_len because we set `use_local_output=True`
+    # (to use plain Tensors), which was because of the bug in computation of complex
+    # numbers with DTensors when setting `use_local_output=False`.
+    # See https://github.com/pytorch/pytorch/issues/130646 and
+    # https://github.com/pytorch/torchtitan/issues/1306 for details.
+    assert (
+        job_config.training.seq_len % (parallel_dims.tp * parallel_dims.cp) == 0
+    ), f"""
+        Sequence length {job_config.training.seq_len} must be divisible by the product of TP degree
+        ({parallel_dims.tp}) and CP degree ({parallel_dims.cp}).
+        """
 
     if parallel_dims.tp_enabled:
         if (