Skip to content

ValueError: Can't find a valid checkpoint at /data1/path/checkpoints/stage_2/checkpoint-50000 #78

@daixixiwang

Description

@daixixiwang

When I run sh ./scripts/tune_script/graphgpt_stage2.sh, I encounter an error, and the error message is as follows:

raise ValueError("Can't find a valid checkpoint at {resume_from_checkpoint}")
ValueError: Can't find a valid checkpoint at /data1/path/checkpoints/stage_2/checkpoint-50000

I have checked the contents of /data1/path/checkpoints/stage_2/checkpoint-50000 and listed the following files:

config.json pytorch_model-00001-of-00003.bin rng_state_1.pth
generation_config.json pytorch_model-00002-of-00003.bin

I would like to ask if anyone has encountered a similar issue where the checkpoint files exist, but the script reports that it cannot find them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions