Skip to content

Missing Checkpoint file #2

@GrobianTM

Description

@GrobianTM

Thank you for providing the project source.

Unfortunately I have encountered an issue when trying to train the network, where a checkpoint file ¨ckpts/r101_dcn_fcos3d_pretrain.pth¨ is missing.

Is this a file you meant to provide?

Command:
$ ./tools/dist_train.sh ./projects/configs/Actformer/Actformer_base.py 1

Error:

2025-06-13 13:02:23,338 - mmdet - INFO - load checkpoint from local path: ckpts/r101_dcn_fcos3d_pretrain.pth
Traceback (most recent call last):
  File "./tools/train.py", line 259, in <module>
    main()
  File "./tools/train.py", line 248, in main
    custom_train_model(
  File "/workspace/ActFormer/projects/mmdet3d_plugin/bevformer/apis/train.py", line 27, in custom_train_model
    custom_train_detector(
  File "/workspace/ActFormer/projects/mmdet3d_plugin/bevformer/apis/mmdet_train.py", line 198, in custom_train_detector
    runner.load_checkpoint(cfg.load_from)
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/base_runner.py", line 337, in load_checkpoint
    return load_checkpoint(
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/checkpoint.py", line 531, in load_checkpoint
    checkpoint = _load_checkpoint(filename, map_location, logger)
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/checkpoint.py", line 470, in _load_checkpoint
    return CheckpointLoader.load_checkpoint(filename, map_location, logger)
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/checkpoint.py", line 249, in load_checkpoint
    return checkpoint_loader(filename, map_location)
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/checkpoint.py", line 265, in load_from_local
    raise IOError(f'{filename} is not a checkpoint file')
OSError: ckpts/r101_dcn_fcos3d_pretrain.pth is not a checkpoint file
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 454) of binary: /usr/bin/python
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 689, in run
    elastic_launch(
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 116, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
***************************************
        ./tools/train.py FAILED        
=======================================
Root Cause:
[0]:
  time: 2025-06-13_13:02:27
  rank: 0 (local_rank: 0)
  exitcode: 1 (pid: 454)
  error_file: <N/A>
  msg: "Process failed with exitcode 1"
=======================================
Other Failures:
  <NO_OTHER_FAILURES>
***************************************

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions