Skip to content

Commit 61ef5cf

Browse files
authored
checkpoint: error if --checkpoint.load_step is specified but not found (#1311)
Currently when `--checkpoint.load_step=xxx` is specified (not -1), and no corresponding `checkpoint/step-xxx` folder is found, it will ~silently~ start from random initialization. User have to check the log for a single line `Loading checkpoint step ....` to make sure it's working. I think that's not very intuitive and maybe explicitly error-out would be better.
1 parent 1ab4353 commit 61ef5cf

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

torchtitan/components/checkpoint.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -445,7 +445,9 @@ def load(self, step: int = -1) -> bool:
445445
checkpoint_id = self._create_checkpoint_id(step)
446446

447447
if not os.path.isdir(checkpoint_id):
448-
return False
448+
raise FileNotFoundError(
449+
f"--checkpoint.load_step={step} but checkpoint {checkpoint_id} is not found."
450+
)
449451

450452
logger.info(f"Loading the checkpoint from {checkpoint_id}.")
451453
begin = time.monotonic()

0 commit comments

Comments
 (0)