v3.0.0 #4409
njzjz
announced in
Announcement
v3.0.0
#4409
Replies: 2 comments 9 replies
-
Really exciting, @njzjz! Very curious, are there any performance differences between the tensorlfow, pytorch and jax? I am thinking of getting that version up and running on LUMI, amd based computer. |
Beta Was this translation helpful? Give feedback.
2 replies
-
Is the offline package with cuda-11 no longer provided? |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
DeePMD-kit v3: Multiple-backend Framework, DPA-2 Large Atomic Model, and Plugin Mechanisms
After eight months of public tests, we are excited to present the first stable version of DeePMD-kit v3, an advanced version that enables deep potential models with TensorFlow, PyTorch, or JAX backends. Additionally, DeePMD-kit v3 introduces support for the DPA-2 model, a novel architecture optimized for large atomic models. This release enhances plugin mechanisms, making integrating and developing new models easier.
Highlights
Multiple-backend framework: TensorFlow, PyTorch, and JAX support
DeePMD-kit v3 adds a versatile, pluggable framework providing consistent training and inference experience across multiple backends. Version 3.0.0 includes:
Critical features of the multiple-backend framework include the ability to:
dp convert-backend
, with backend-specific file extensions (e.g.,.pb
for TensorFlow and.pth
for PyTorch).dp test
, Python/C++/C interfaces, or third-party packages (e.g., dpdata, ASE, LAMMPS, AMBER, Gromacs, i-PI, CP2K, OpenMM, ABACUS, etc.).DPA-2 model: a large atomic model as a multi-task learner
The DPA-2 model offers a robust architecture for large atomic models (LAM), accurately representing diverse chemical systems for high-quality simulations. In this release, DPA-2 can be trained using the PyTorch backend, supporting both single-task (see
examples/water/dpa2
) or multi-task (seeexamples/water_multi_task/pytorch_example
) training schemes. DPA-2 is available for Python/C++ inference in the JAX backend.The DPA-2 descriptor comprises
repinit
andrepformer
, as shown below.The PyTorch backend supports training strategies for large atomic models, including:
examples/water_multi_task/pytorch_example/input_torch.json
.--finetune
argument in thedp --pt train
command line.Plugin mechanisms for external models
In version 3.0.0, the plugin capabilities have been implemented to support the development and integration of potential energy models using TensorFlow, PyTorch, or JAX backends, leveraging DeePMD-kit's trainer, loss functions, and interfaces. A plugin example is deepmd-gnn, which supports training the MACE and NequIP models in the DeePMD-kit with the familiar commands.
dp --pt train mace.json dp --pt freeze dp --pt test -m frozen_model.pth -s ../data/
Other new features
max_ckpt_keep
(Addmax_ckpt_keep
for trainer #3441),change_bias_after_training
(fix(pt): fix lammps nlist sort with large sel #3993), andstat_file
.dp change-bias
(fix(pt): fix lammps nlist sort with large sel #3993) anddp show
(feat(pt): Add command to check the available model branches in multi-task pre-trained model(Issue #3742) #3796).Breaking changes
.pb
extension.set_prefix
key is deprecated. (breaking: deprecateset_prefix
#3753)dp test
now uses all sets for training and test. In previous versions, only the last set is used as the test set in dp test. (breaking: use all sets for training and test #3862)deepmd
module was moved todeepmd.tf
without other API changes, anddeepmd_utils
was moved todeepmd
without other API changes. (breaking: move deepmd to deepmd.tf #3177, breaking: move deepmd_utils to deepmd #3178)DeepTensor
(includingDeepDiople
andDeepPolar
) now returns atomic tensor in the dimension ofnatoms
instead ofnsel_atoms
. (breaking: change DeepTensor output dim from nsel_atoms to natoms #3390)For other changes, refer to Full Changelog: v2.2.11...v3.0.0rc0
Contributors
The PyTorch backend was developed in the dptech-corp/deepmd-pytorch repository, and then it was fully merged into the deepmd-kit repository in #3180. Contributors to the deepmd-pytorch repository:
Contributors to the deepmd-kit repository:
preprocess_shared_params
when using non-zero share level #3615 feat(pt): Add command to check the available model branches in multi-task pre-trained model(Issue #3742) #3796 docs: add document equations forse_atten_v2
#3828 Feat: addse_atten_v2
to PyTorch and DP #3840 feat pt : Support property fitting #3867 Fix(load_library): WhenENABLE_CUSTOMIZED_OP = False
, changeop_info = None
toop_info = {}
#3912 fix: bugs in uts for property fit #4120 fix(pt): finetuning property/dipole/polar/dos fitting with multi-dimensional data causes error #4145 fix(dptest): Wrong dptest results except for energy head #4280use_aparam_as_mask
for pt backend #4246 feat(pt): support loss plugin for external package #4248change_energy_bias
and fix finetune #3378 pt: Add support for dipole and polar training #3380 Doc: Update PT Multi-task #3387 pt: fix params with no docstrs #3388 pt: fix multitask print_summary #3409 pt: fix multitask stuck on multiple-gpu #3411 Addmax_ckpt_keep
for trainer #3441 pt: cleanup tester #3442 pt: add explicit decay_rate for lr #3445 pt: add index input foruse_spin
#3456 pt: support multitask finetune #3480 pt: refactor loss #3569 pt: fix loss training when no data available #3571 pt: support multitask dp test #3573 pt: fix typo in multitask finetune #3607 Fix fine-tuning entries bug when doing restart. #3616 pt: use unified activation #3619 refact: the DPA1 descriptor #3696 Fix typo insmooth_type_embdding
#3698 feat: Support stripped type embedding in DPA1 of PT/DP #3712 docs: add doc for multitask fine-tuning #3717 bug: fix spin nlist in spin_model #3718 bug: fix numerical diff in DPA1 dotr between DP/PT #3725 pt: remove old impl ofDescrptBlockHybrid
#3746 bug: fix no raiseRuntimeError
#3748 refact: the DPA2 descriptor #3758 breaking: remove multi-task support in tf #3763 breaking: seperate params in dpa2 #3768 feat: support seed for pt/dp models #3773 fix: lcurve header wrong when no validation data #3774 feat(pt): support disp_training and time_training in pt #3775 feat(pt/tf/dp): support econf type embedding #3781 feat(pt): support complete form energy loss #3782 chore(pt): lower the atol for dpa2 test #3785 feat(pt): consistent fine-tuning with init-model #3803 feat(dp/pt): refactorse_e3
descriptor #3813 fix: lower the atol for DPA2 corner case #3814 fix: DPA1 should use masked gate whenexcluded_types
#3815 fix(pt): build nlist faster withtorch.amax
#3826 fix: bugs in uts for polar and dipole fit #3837 fix: correctexclude_types
in descriptors #3841 fix: rm unused import of__version__
#3842 fix: useget_model_def_script
method #3843 test(pt/dp): add universal uts for all models #3873 fix(UT): rm extratearDown
in test_training.py #3906 feat(pt): support fine-tuning from random fitting #3914 fix(pt): fix seed in dpmodel fitting #3916 feat(pt): support multitask argcheck #3925 feat(pt/tf): init-(frz)-model use pretrain script #3926 fix(pt): info typo in log #3927 feat(pt/tf): add bias changing param/interface #3933 fix(pt): fix global bias stat with different natom #3944 feat(pt): add datafile option for change-bias #3945 fix(dp): fix dp seed in dpa2 descriptor #3957 breaking(pt/tf/dp): disable bias in type embedding #3958 fix(pt): addfinetune_head
to argcheck #3967 feat(tf): improve the activation setting in tebd #3971 fix(pt/tf/dp): normalize the econf #3976 fix(pt): make 'find_' to be float in get data #3992 fix(pt): fix lammps nlist sort with large sel #3993 fix(pt): optimize graph memory usage #4006 fix(pt): fixget_dim
forDescrptDPA1Compat
#4007 fix(pt): use user seed inDpLoaderSet
#4015 feat(pt/dp): support three-body type embedding #4066 breaking(pt/dp): tune new sub-structures for DPA2 #4089 docs(pt): examples for new dpa2 model #4138 fix(pt/dp): share params ofrepinit_three_body
#4139 fix(pt): makestate_dict
safe forweights_only
#4148 fix(pt ut): make separated uts deterministic #4162 fix(pt): make intrcut
safe after jit op #4222 Chore(pt):rm old pt implementation #4223 feat(pt): support CPU parallel training with PT #4224 Chore(pt): refactor the command function interface #4225 fix(tf): fix compress suffix in DescrptDPA1Compat #4243 Chore(pt): slim uts for dpa1 #4244 feat(tf/pt): add/refact lammps & C++ support for spin model #4321 fix(dp/pt): support auto sel for dpa2 #4323 fix(pt/dp): make dpa2 convertable to.dp
format #4324 fix(pt): fix precision #4344 fix(pt): fix not usedsys_probs
#4353 feat(pt): add universal test for loss #4354 docs: set precision explicitly in the DPA-2 example #4372 doc: update spin lmp doc and example #4375 breaking(pt/dp):change env_protection for spin #4394 Chore(doc): merge multitask training doc #4395 [BUG] seed is unsafe in TF parallel training #4440deepmd_utils
#3173 add dpdata driver #3174 add more abstractmethods to universalDeepPot
#3175 cc: reimplement read_file_to_string without calling TensorFlow #3176 breaking: move deepmd to deepmd.tf #3177 breaking: move deepmd_utils to deepmd #3178 docs: rewrite README; deprecate manually written TOC #3179 cc: fix returning type of sel_types #3181 breaking: drop Python 3.7 support #3185 Fix PTDeepPot
and replace ASE calculator #3186 Merge TF and PT CLI #3187 PT: keep the same checkpoint behavior as TF #3191 docs: document PyTorch backend #3193 drop tqdm #3194 add size and replace arguments to deepmd.utils.random.choice #3195 reorganize tests directory #3196 fix: install CU11 PyTorch in the CU11 docker image #3198 allow disabling TensorFlow backend during Python installation #3200 throw errors when PyTorch CXX11 ABI is different from TensorFlow #3201 pt: add tensorboard and profiler support #3204 pt: set nthreads from env #3205 build macos-arm64 wheel on M1 runners #3206 fix GPU test OOM problem #3207 refactor DeepEval #3213 fix compile gromacs with precompiled C library #3217 pt: apply global user set precision to pt #3220 gmx: fix include directive #3221 pt: apply global logger to pt #3222 c: fix all memory leaks; add sanitizer checks #3223 pt: rename atomic_virial to atom_virial in the model output #3226 add category property to OutputVariableDef #3228 fix DP_ENABLE_TENSORFLOW support #3229 c: change the required shape of electric field to nloc * 3 #3237 pin docker actions to major versions #3238 dropdeepmd.tf.cluster.slurm
#3239 refactor print summary #3243 issue template: change TF version to backend version #3244 addget_type_map
method to model; export model methods #3247 backend-indepedent dp test #3249 pt: infer model type from ModelOutputDef #3250 pt: support loading frozen models in DeepEval #3253 tf: support checkpoint path (instead of directory) in dp freeze #3254 ipi: remove normalize_coord #3257 pt: add exported methods to BaseAtomicModel #3258 support TF se_e2_a serialization; add a common test fixture to compare TF, PT, and DP models #3263 enable docstring code format #3267 add neighbor stat support with NumPy and PyTorch implementation #3271 tf: refactor neighbor stat #3275 pt: fix torchscript converage #3276 improve gh actions #3283 speed up cuda test #3284 pt: refactor data stat #3285 consistent energy fitting #3286 fix gh actions issues #3288 gh actions: fix branches ignore pattern & fix activity types #3290 dp&pt: let DPAtomicModel fetch attributes from Fitting #3292 pt: process frames in parallel for env mat stat #3293 pluggable backend #3294 pt: avoidset_default_dtype
in tests #3303 fix neighbor stat mixed_types input #3304 consistent energy model #3306 pt: explicitly set device #3307 allow both absulute and relative tolerance when testing consistency #3308 merge compute_output_stat #3310 tf: add fparam/aparam support for finetune #3313 pt: fix se_e2_a precision cast #3315 merge common subcommands in cli #3316 pt: exportmodel_output_type
instead ofmodel_output_def
#3318 feat: convert model files between backends #3323 store type in descriptor serialization data #3325 fix AlmaLinux GPG key error #3326 pt: remove env.DEVICE in allforward
functions #3330 store type in fitting serialization data #3331 feat: add NumPy DeepPot #3332 docs: install pytorch in RTD #3333 add BaseModel; store type in serialization #3335 feat(pt/dpmodel): support type_one_side in se_e2_a #3339 pt: apply argcheck to pt #3342 bump python to 3.12 in the test environment #3343 apply PluginVariant and make_plugin_registry to classes #3346 feat: update sel by statistics #3348 add @version to serialization data #3349 pt: support--init-frz-model
#3350 feat(pt): support fparam/aparam in DeepEval #3356 feat(pt): support fparam/aparam in C++ DeepPot #3358 docs: dpmodel, model conversion #3360 pt: fix se_a type_one_side performance degradation #3361 docs: apply type_one_side=True tose_a
andse_r
#3364 Hybrid descriptor #3365 fix se_r consistency #3366 bump scikit-build-core to 0.8 #3369 feat: atom_ener in energy fitting #3370 docs: DPRc for PT, DPModel #3373 sync descriptor alias #3374 pt: supprot--output
indp train
#3377 tf: remove freeze warning for optional nodes #3381 fix: prevent deepmd.tf be imported globally #3382 pt: print data summary #3383 pt: expand systems before training #3384 pt: add fparam/aparam data requirements #3386 breaking: change DeepTensor output dim from nsel_atoms to natoms #3390 throw errros if rmin is no less than rmax #3393 allow loading either nsel or natoms atomic tensor data #3394 pt: bantorch.testing.assert_allclose
#3395 do not return g2, h2, sw in hybrid descriptors #3396 format training logging #3397 bump LAMMPS to stable_2Aug2023_update3 #3399 fix github actions for release #3402 fix deepmd-kit-cu11 again #3403 setNUM_WORKERS
to 0 in test_cuda action #3404 pt: Fix compilation with libtorch #3405 ban print #3415 convert exclude_types to sel_type #3418 revert test Python to 3.11 #3419 pt: avoid torch.tensor(constant) during forward #3421 pt: make get_data non-blocking #3422 pt: fix print_on_training when there is no validation data #3423 pt: avoid D2H in se_e2_a #3424 pt: improve nlist performance #3425 Consistent activation functions between backends #3431 fix errors whendp
is executed without any subcommands #3437 refactor: split Model and AtomicModel #3438 pt: make jit happy with torch 2.0.0 #3443 fix: do not install tf-keras for cu11 #3444 fix: remove model_def_script from AtomicModel #3449 feat(pt): consistent "frozen" model #3450 chore: remove unused init_fitting_stat #3453 ci: reduce ASLR entropy #3461 docs: add deprecation notice for the official conda channel and more conda docs #3462 fix(tf): fix DeepEval degradation for virtual types #3464 fix(pt): Fix PairTabAtomicModel OOM error #3484 Clean TODOs and convert them into issues #3519 fix(pt): fix a typo in DeepEval to check do_atomic_virial #3570 test: add LAMMPS MPI tests #3572 ci: add linter for markdown, yaml, CSS #3574 fix: move DeepPotential fromdeepmd.tf.infer
todeepmd.infer
#3580 fix(tf): fix bugs in tensor training and migrate to reformat data #3581 tf: add explict mixed_types argument to fittings #3583 chore: remove incorrect memset TODOs #3600 feat: Support bfloat16 and ensure valid precision and activation functions consistent everywhere #3601 feat: AddUSE_PT_PYTHON_LIBS
CMake variable #3605 pin nvidia-cudnn-cu{11,12} to <9 #3610 feat: consistent type embedding #3617 chore(build): move static part of dynamic metadata to pyproject.toml #3618 feat(pt): add op library #3620 chore: movesource/op
tosource/op/tf
#3621 fix: fix type hint of sel #3624 feat: apply descriptor exclude_types to env mat stat #3625 fix: fix DPOSPath.save_numpy, DPH5Path.is_file, DPH5Path.is_dir #3631 fix(tf): makese_atten_v2
masking smooth when davg is not zero #3632 feat(pt): allow using directories to store stat #3633 fix: set rpath for libtorch and protobuf #3636 fix(tf): apply exclude types to se_atten_v2 switch #3651 fix: fix git version detection in docker_package_c.sh #3658 fix(pt): fix model_def_script #3671 CI: Accerate GitHub Actions using uv #3676 fix(tf): fix float32 for exclude_types in se_atten_v2 #3682 docs: setup uv for readthedocs #3685 docs: fix pdf build due to svg #3686 ci: format bibtex #3687 docs: fix convert svg error on RTD #3688 docs: fix RTD timeout issue #3694 ci(build): use uv for cibuildwheel #3695 chore(dpmodel): move save_dp_model and load_dp_model to a seperated module #3701 build(tf): remove keras from dependencies #3709 test(hybrid): add ut for descriptor hybrid #3711 build(deps): bump tar from 6.1.14 to 6.2.1 in /source/nodejs #3714 tests: move init_models to setUpModule #3715 ci(test): split each pytest job into 6 separated jobs #3716 build: unpin tensorflow version on windows #3721 feat(C): add preprocessor define for C API version #3737 breaking: deprecateset_prefix
#3753 test(pt): add common test case for model/atomic model #3767 ci: speed up Python test #3776 tests: skip attention-related parameterize when attn_layer is 0 #3784 test(python): enable building PT OP #3787 chore: improve type anotations in deepmd.infer #3792 style: enable W rules #3793 feat(tf): pass rcut to PairTab #3794 refactor: remove global data_requirements #3798 fix: remove--input_script
fromdp test
#3800 lmp: improve error message when compute/fix is not found #3801 docs: update DPA-1 reference #3810 chore: cleanup out-of-date TODOs #3811 chore: rename j_must_have to j_deprecated and only warn about deprecated keys #3816 ci: fix test-python test_durations and its caches #3820 refactor: refactor update_sel and save min_nbor_dist #3829 docs: improve se_atten documentation #3832 fix: fix DeepGlobalPolar and DeepWFC initlization #3834 fix: fix ipi package #3835 fix(pt): improve out-of-memory handling #3836 fix(pt): loose tolerance for TransTest #3838 chore: remove type embedding TODO from se_r serialize #3845 ci: bump ase to 3.23.0 #3846 feat: support generating JSON schema for integration with VSCode #3849 feat: add has_message_passing API #3851 fix(tf): fix modifier_type in DeepEval #3855 fix(test): make unit tests deterministic #3856 fix(pt): improve out-of-memory capture #3857 fix(tf): throw RuntimeError for se_a + type_embedding #3861 breaking: use all sets for training and test #3862 fix(ci): pin uv to 0.2.10 #3870 docs: fix footnote #3872 Revert "fix(ci): pin uv to 0.2.10" #3874 docs: improve multi-backend documentation #3875 fix: add mendeleev to dependencies; remove dpdata; remove catching ImportError #3878 feat: add seeds to dpmodel and fix seeds in tf & pt #3880 chore: replacereduciable
withreducible
#3888 chore(ci): workaround to retryerror decoding response body
from uv #3889 fix(cmake): fix PyTorch_LIBRARY_PATH #3890 feat(pt): allow PT OP CXXABI different from TF #3891 fix(cmake): fix OP_CXX_ABI_PT on macos/windows #3893 ci(wheel): build PT OPs #3894 feat(pt): add more information to summary and error message of loading library #3895 docs: cleanup out-of-date doc_only_tf_supported in arguments #3896 feat(pt): supporttraining/profiling
argument in PT #3897 fix(cc,pt): translate PT exceptions to the DeePMD-kit exception #3918 docs: developer docs for the universal unit tests #3921 feat: support array API #3922 fix(tf): preventfitting_attr
variable scope from becomingfitting_attr_1
#3930 style: enable B904 #3956 ci(deps): bump uv to 0.2.24 #3964 feat: add plugin entry point for PT #3965 fix(cmake): fix USE_PT_PYTHON_LIBS #3972 fix(cmake): set C++ standard according to the PyTorch version #3973 fix(cmake): fixset_if_higher
#3977 fix(pt): ensure suffix of--init_model
and--restart
is.pt
#3980 docs: documentPYTORCH_ROOT
#3981 docs: Disallow improper capitalization #3982 fix(pt): do not overwrite disp_file when restarting training #3985 fix(cc): compileselect_map<int>
when TensorFlow backend is off #3987 feat: add documentation and options for multi-task arguments #3989 feat: allow model arguments to be registered outside #3995 fix(cc): addatomic
argument toDeepPotBase::computew
#3996 style: require explicit device and dtype #4001 feat: addget_model
classmethod toBaseModel
#4002 fix: fix errors for zero atom inputs #4005 ci: pin PT to 2.3.1 when using CUDA #4009 fix(c): call C++ interface without atomic properties when they are not requested #4010 fix(lmp): call model deviation interface without atomic properties when they are not requested #4012 fix(cc): fix message passing when nloc is 0 #4021 style: enable N804 and N805 #4024 feat: plain text model format #4025 fix: fix nopbc in dpdata driver #4027 fix: manage testing models in a standard way #4028 fix:fix LAMMPS MPI tests with mpi4py 4.0.0 #4032 chore(deps): bump scikit-build-core to 0.9.x #4038 fix: fix PT AutoBatchSize OOM bug and merge execute_all into base #4047 feat: makedp neighbor-stat --type-map
optional #4049 ci: test Python 3.12 #4059 fix: replacedatetime.datetime.utcnow
which is deprecated #4067 breaking: drop C++ 11 #4068 docs: improve docs for environment variables #4070 docs: dynamically generate command outputs #4071 feat: load customized OP library in the C++ interface #4073 docs: improve error message for inconsistent type maps #4074 docs: add multiple packages tointersphinx_mapping
#4075 docs: document CMake variables using Sphinx styles #4079 docs: update ipi installation command #4081 docs: fix the default value ofDP_ENABLE_PYTORCH
#4083 fix: bump LAMMPS to stable_29Aug2024 #4088 ci: addinclude-hidden-files
toactions/upload-artifact
#4095 fix(pt): fix ValueError when array byte order is not native #4100 fix(pt): converttorch.__version__
tostr
when serializing #4106 #4110 #4111 #4113 #4131 #4134 #4136 #4144 #4146 #4147 #4152 #4153 #4155 #4156 #4160 #4172 #4176 #4178 chore: bump LAMMPS to stable_29Aug2024_update1 #4179 #4180 breaking: drop Python 3.8 support #4185 #4187 #4190 #4196 #4199 #4200 #4204 #4212 #4213 #4214 #4217 #4218 #4219 #4220 #4221 #4226 #4228 #4230 #4236 #4238 #4239 #4240 #4242 #4247 #4251 #4252 #4254 #4256 #4257 #4258 #4259 #4260 #4261 #4263 #4264 #4269 #4271 #4274 #4275 #4278 #4284 #4285 #4286 #4287 #4288 #4289 #4290 #4293 #4294 #4301 #4304 #4307 #4309 #4313 #4315 #4318 #4319 #4320 #4325 #4326 #4327 #4329 #4330 #4331 #4336 #4338 #4341 #4342 #4343 #4345 #4350 #4351 #4352 #4355 #4356 #4357 #4363 #4365 #4369 #4377 #4383 #4384 docs: clean up deprecated deepmodeling conda channel docs #4385 #4386 #4387 #4388 #4390 #4391 #4392 #4402 #4403 #4404 #4405 #4406We also thank everyone who did tests and reported bugs in the past eight months.
This discussion was created from the release v3.0.0.
Beta Was this translation helpful? Give feedback.
All reactions