Skip to content

Commit bf835b5

Browse files
authored
[Flux] Reduce debug model size to speed up flux integration tests (#1295)
1. Reduce the frequency of saving a checkpoint 2. Reduce the hidden dimension size and num_head to trim the size of debug model. Reducing the debugging model size still makes sense, it has ~0.25 B params now (previous is 1B) , it shows faster converge and running speed on local testing 3. Delete checkpoints before uploading CI results to save time
1 parent 5e09057 commit bf835b5

File tree

3 files changed

+5
-3
lines changed

3 files changed

+5
-3
lines changed

.github/workflows/integration_test_8gpu_flux.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ jobs:
3131
docker-image: torchtitan-ubuntu-20.04-clang12
3232
repository: pytorch/torchtitan
3333
upload-artifact: outputs
34+
# delete the checkpoints in the artifacts to save CI uploading time
3435
script: |
3536
set -eux
3637
@@ -44,3 +45,4 @@ jobs:
4445
4546
mkdir artifacts-to-be-uploaded
4647
python -m torchtitan.experiments.flux.tests.integration_tests artifacts-to-be-uploaded --ngpu 8
48+
rm -rf artifacts-to-be-uploaded/*/checkpoint

torchtitan/experiments/flux/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,9 +81,9 @@
8181
out_channels=64,
8282
vec_in_dim=768,
8383
context_in_dim=4096,
84-
hidden_size=3072,
84+
hidden_size=1536,
8585
mlp_ratio=4.0,
86-
num_heads=24,
86+
num_heads=12,
8787
depth=2,
8888
depth_single_blocks=2,
8989
axes_dim=(16, 56, 56),

torchtitan/experiments/flux/train_configs/debug_model.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ mode = "full"
6767
[checkpoint]
6868
enable_checkpoint = false
6969
folder = "checkpoint"
70-
interval = 5
70+
interval = 10
7171
last_save_model_weights_only = false
7272
export_dtype = "float32"
7373
async_mode = "disabled" # ["disabled", "async", "async_with_pinned_mem"]

0 commit comments

Comments
 (0)