call init_weights before generation (#1371)

jquesnelle · web-flow · commit 170b9924da60 · 2025-07-08T16:14:35.000-07:00
Since #1338 the `freqs_cis` buffer is no longer persisted/read in any code path with the intention being that it is re-calculated at the model loading/initialization. However this requires calling `init_weights` on the model, which `scripts/test_generate.py` currently is not doing. As of right now running generation on the pretrained Llama 3 models will result in garbled outputs Convert weights: `python ./scripts/convert_llama_to_dcp.py /home/emozilla/hf/Llama-3-8B/original /home/emozilla/dcp/Llama-3-8B` Run generation: `CONFIG_FILE=./torchtitan/models/llama3/train_configs/llama3_8b.toml CHECKPOINT_DIR=/home/emozilla/dcp/Llama-3-8B PROMPT="A long time ago in a galaxy far, far away" ./scripts/generate/run_llama_generate.sh` HEAD ``` <|begin_of_text|>A long time ago in a galaxy far, far away000 centershift Equity KelleyYe требаyrais& Romgraph1Kォ IDEA globalčil at390dagThe,inLikeBelow uptimeRoman_constsBothtz_RATE phủ ``` With fix ``` <|begin_of_text|>A long time ago in a galaxy far, far away… Aspirations were bursting and Jedi were making a big imprint in the arts, in the government, and in our lives. That was 34 or ```
diff --git a/scripts/generate/test_generate.py b/scripts/generate/test_generate.py
@@ -140,6 +140,8 @@ def test_generate(
 
     # materalize model
     model.to_empty(device=device_type)
+    with torch.no_grad():
+        model.init_weights()
     model.eval()
 
     state_dict = model.state_dict()