Replies: 1 comment 1 reply
-
Do not load other intel libraries when your environment is managed by conda, as conda may have its intel packages. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Dear Developers,
I am using DeePMD-kit v3.0.0b3 for dpa-2 pretrain and finetune. I installed DeePMD-kit v3.0.0b3 from source successfully, but there was something wrong when it was runing. Can anybody help me fix it? Thanks a lot!!!
command: dp --pt train input_torch.json --finetune ./OpenLAM_2.2.0_27heads_beta3.pt --model-branch H2O_H2O-PD
output:
Traceback (most recent call last):
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/site-packages/deepmd/pt/cxx_op.py", line 39, in load_library
torch.ops.load_library(module_file)
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/site-packages/torch/_ops.py", line 1295, in load_library
ctypes.CDLL(path)
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/ctypes/init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
OSError: /share/apps/intel2020u4/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so: undefined symbol: mkl_sparse_z_bsr_ng_avx521_sp_mv_i4
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/bin/dp", line 8, in
sys.exit(main())
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/site-packages/deepmd/main.py", line 916, in main
deepmd_main = BACKENDSargs.backend.entry_point_hook
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/site-packages/deepmd/backend/pytorch.py", line 66, in entry_point_hook
from deepmd.pt.entrypoints.main import main as deepmd_main
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/site-packages/deepmd/pt/init.py", line 4, in
from deepmd.pt.cxx_op import (
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/site-packages/deepmd/pt/cxx_op.py", line 95, in
ENABLE_CUSTOMIZED_OP = load_library("deepmd_op_pt")
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/site-packages/deepmd/pt/cxx_op.py", line 90, in load_library
raise RuntimeError(error_message) from e
RuntimeError: This deepmd-kit package is inconsitent with PyTorch Runtime, thus an error is raised when loading deepmd_op_pt. You need to rebuild deepmd-kit against this PyTorch runtime.
AND, the OSError content varies according to the version of intel:
1)
OSError: /share/apps/intel_ips_xe_2018u3/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so: undefined symbol: mkl_blas_zgemm_blk_info_hi_thr_bdz
2)
OSError: /share/home/xxx/apps/intel/oneapi_20240201/mkl/2024.2/lib/libmkl_intel_thread.so: undefined symbol: mkl_sparse_d_csr_seq_sym_u_full_fw_sor_i4
Beta Was this translation helpful? Give feedback.
All reactions