An error meet when run dpgen #1175
Unanswered
maoxinxina
asked this question in
Q&A
Replies: 2 comments
-
Did you find a solution please, I'm having the same problem? |
Beta Was this translation helpful? Give feedback.
0 replies
-
It's an error reported by Slurm, saying there was no available node you requested. You might ask your cluster administrator what is available. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Here I am running the dpgen on the Slurm squeue, An error occur.
Description
2023-04-04 15:56:02,277 - INFO : info:check_all_finished: False
Traceback (most recent call last):
File "/HOME/scz0aai/run/deepmd-kit/lib/python3.10/site-packages/dpdispatcher/submission.py", line 285, in handle_unexpected_submission_state
job.handle_unexpected_job_state()
File "/HOME/scz0aai/run/deepmd-kit/lib/python3.10/site-packages/dpdispatcher/submission.py", line 751, in handle_unexpected_job_state
self.submit_job()
File "/HOME/scz0aai/run/deepmd-kit/lib/python3.10/site-packages/dpdispatcher/submission.py", line 798, in submit_job
job_id = self.machine.do_submit(self)
job_id = self.machine.do_submit(self)
2023-04-04 15:56:02,277 - INFO : info:check_all_finished: False
Traceback (most recent call last):
File "/HOME/scz0aai/run/deepmd-kit/lib/python3.10/site-packages/dpdispatcher/submission.py", line 285, in handle_unexpected_submission_state
job.handle_unexpected_job_state()
File "/HOME/scz0aai/run/deepmd-kit/lib/python3.10/site-packages/dpdispatcher/submission.py", line 751, in handle_unexpected_job_state
self.submit_job()
File "/HOME/scz0aai/run/deepmd-kit/lib/python3.10/site-packages/dpdispatcher/submission.py", line 798, in submit_job
job_id = self.machine.do_submit(self)
File "/HOME/scz0aai/run/deepmd-kit/lib/python3.10/site-packages/dpdispatcher/utils.py", line 179, in wrapper
return func(*args, **kwargs)
File "/HOME/scz0aai/run/deepmd-kit/lib/python3.10/site-packages/dpdispatcher/slurm.py", line 84, in do_submit
raise RuntimeError(
RuntimeError: status command squeue fails to execute
error message:sbatch: error: Batch job submission failed: Requested node configuration is not available
return code 1
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/HOME/scz0aai/run/deepmd-kit/bin/dpgen", line 8, in
sys.exit(main())
File "/HOME/scz0aai/run/deepmd-kit/lib/python3.10/site-packages/dpgen/main.py", line 233, in main
args.func(args)
File "/HOME/scz0aai/run/deepmd-kit/lib/python3.10/site-packages/dpgen/generator/run.py", line 5109, in gen_run
run_iter(args.PARAM, args.MACHINE)
File "/HOME/scz0aai/run/deepmd-kit/lib/python3.10/site-packages/dpgen/generator/run.py", line 4440, in run_iter
run_train(ii, jdata, mdata)
File "/HOME/scz0aai/run/deepmd-kit/lib/python3.10/site-packages/dpgen/generator/run.py", line 776, in run_train
submission.run_submission()
File "/HOME/scz0aai/run/deepmd-kit/lib/python3.10/site-packages/dpdispatcher/submission.py", line 222, in run_submission
self.handle_unexpected_submission_state()
File "/HOME/scz0aai/run/deepmd-kit/lib/python3.10/site-packages/dpdispatcher/submission.py", line 288, in handle_unexpected_submission_state
raise RuntimeError(
RuntimeError: Meet errors will handle unexpected submission state.
Debug information: remote_root==/HOME/scz0aai/run/maoxin/dpgen_test/tmp2023/rererun/work/1f2a3a2a757b38d4b506119950b64ccf1c5c9d04.
Debug information: submission_hash==1f2a3a2a757b38d4b506119950b64ccf1c5c9d04.
Please check the dirs and scripts in remote_root. The job information mentioned above may help.
The machine.json is set as:
{
"api_version": "1.0",
"deepmd_version": "2.0.1",
"train" :[
{
"command": "dp",
"machine": {
"batch_type": "Slurm",
"context_type": "local",
"local_root" : "./",
"remote_root": "/HOME/scz0aai/run/maoxin/dpgen_test/tmp2023/rererun/work"
},
"resources": {
"number_node": 1,
"_cpu_per_node": 4,
"gpu_per_node": 1,
"group_size": 1,
"queue_name":"gpu",
"_custom_flags" :["#SBATCH --mem=20G"],
"source_list":[ "/HOME/scz0aai/run/deepmd-kit"
],
"module_list":["cuda/11.6"]
}
}
],
"model_devi":[
{
"command": "lmp",
"machine": {
"batch_type": "Slurm",
"context_type": "local",
"local_root" : "./",
"remote_root": "/HOME/scz0aai/run/maoxin/dpgen_test/tmp2023/rererun/work"
},
"resources": {
"number_node": 1,
"_cpu_per_node": 4,
"gpu_per_node": 1,
"group_size": 10,
"queue_name":"gpu",
"_custom_flags" : ["#SBATCH --mem=20G"],
"exlued_list":[],
"source_list":["source activate /HOME/scz0aai/run/deepmd-kit; module load cuda/11.6"
],
"module_list":[]
}
}
],
"fp":[
{
"command": "mpirun -np 4 vasp_std",
"machine": {
"batch_type": "Slurm",
"context_type": "local",
"local_root" : "./",
"remote_root": "/HOME/scz0aai/run/maoxin/dpgen_test/tmp2023/rererun/work"
},
"resources": {
"number_node": 1,
"cpu_per_node": 4,
"gpu_per_node": 1,
"_group_size": 125,
"source_list":["module load intel/parallelstudio/2017.1.5; export PATH=/HOME/scz0aai/run/vasp.5.4.4/bin:$PATH"
]
}
}
]
}
So I wonder how to tackle the issue. Thanks a lot.
Beta Was this translation helpful? Give feedback.
All reactions