Skip to content

local-rank argument issue #153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
azuryl opened this issue Mar 7, 2025 · 1 comment
Open

local-rank argument issue #153

azuryl opened this issue Mar 7, 2025 · 1 comment

Comments

@azuryl
Copy link

azuryl commented Mar 7, 2025

python -m torch.distributed.launch --nproc_per_node=6 train.py --checkpoint ./pretrained_checkpoint/sam_vit_l_0b3195.pth --model-type vit_l --output work_dirs/hq_sam_l
/home/azuryl/anaconda3/envs/sam_hq2/lib/python3.10/site-packages/torch/distributed/launch.py:208: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects --local-rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

main()
W0306 18:12:40.932000 150874 site-packages/torch/distributed/run.py:792]
W0306 18:12:40.932000 150874 site-packages/torch/distributed/run.py:792] *****************************************
W0306 18:12:40.932000 150874 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0306 18:12:40.932000 150874 site-packages/torch/distributed/run.py:792] *****************************************
usage: HQ-SAM --output OUTPUT [--model-type MODEL_TYPE] --checkpoint CHECKPOINT [--device DEVICE] [--seed SEED] [--learning_rate LEARNING_RATE] [--start_epoch START_EPOCH]
[--lr_drop_epoch LR_DROP_EPOCH] [--max_epoch_num MAX_EPOCH_NUM] [--input_size INPUT_SIZE] [--batch_size_train BATCH_SIZE_TRAIN] [--batch_size_valid BATCH_SIZE_VALID]
[--model_save_fre MODEL_SAVE_FRE] [--world_size WORLD_SIZE] [--dist_url DIST_URL] [--rank RANK] [--local_rank LOCAL_RANK] [--find_unused_params] [--eval] [--visualize]
[--restore-model RESTORE_MODEL]
HQ-SAM: error: unrecognized arguments: --local-rank=5

should modified to torchrun --nproc_per_node=8 train.py --checkpoint ./pretrained_checkpoint/sam_vit_l_0b3195.pth --model-type vit_l --output work_dirs/hq_sam_l

@azuryl
Copy link
Author

azuryl commented Mar 7, 2025

also need python -m pip install -U scikit-image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant