This project is for arXiv:2310.18582.
We present a data-driven method to learn stochastic reduced models of complex systems that retain a state-dependent memory beyond the standard generalized Langevin equation (GLE) with a homogeneous kernel. The constructed model naturally encodes the heterogeneous energy dissipation by jointly learning a set of state features and the non-Markovian coupling among the features. Numerical results demonstrate the limitation of the standard GLE and the essential role of the broadly overlooked state-dependency nature in predicting molecule kinetics related to conformation relaxation and transition.
Consider a polymer molecule consisting of 16 atoms. The resolved variable is defined as the end-to-end distance. (see paper).
The example is given in folder 'case_unimodal', and 'main.m' provides how to drive these codes.
-
Compute the probability distribution function by 'step1_PDF.m', which provides free energy and the conservative force ('data/PDF.mat').
-
Compute the one feature (1D)
$h(q)$ by 'step2_hx.m'. This is done by only considering the state-dependency when$t=0$ ('data/PDF.mat').
- Compute the two-point correlation functions by 'step3_corr.m' and 'step4_hx_corr.m' to construct 1D kernel ('data/corr.mat' and 'data/hx_corr.mat').
-
Compute three-point correlation functions for N features (ND) state-dependent kernel by 'step3_training_set.m' and 'step4_collect_training_set.m' ('data/dx_10_w_501.mat').
-
Train the model with 'train.py' ('MD_ND_2.mat').
-
Simulate the standard GLE model and state-dependent GLE model by 'step5_std_GLE.m', 'step5_hx_GLE_1D.m' and 'step5_hx_GLE_2D.m' (mat files in 'GLE_data').
-
Compute correlation functions of all the reduced models by 'step6_GLE_corr.m' ('corr_GLE.mat', 'corr_hx_GLE_1D.mat', 'corr_hx_GLE_2D.mat').
-
The visualization is at the end of the 'main.m'.
The two figures show the probability distribution and free energy without an energy barrier.
The following two figures shows velocity correlation
The following figure shows the distribution of the period for the molecule taking a certain conformation state (
Consider the molecule benzyl bromide in an aqueous environment. The full system consists of one benzyl bromide molecule and 2400 water molecules with the periodic boundary condition imposed along each direction. The resolved variable is defined as the distance between the bromine atom and the ipso-carbon atom. (see arxiv)
The example is given in folder 'case_bimodal', and 'main.m' provides how to drive these codes. The parameters here are smaller than the ones used in the paper. The number of bases for
-
Compute the probability distribution function by 'step1_PDF.m', which provides free energy and the conservative force ('data/PDF.mat').
-
Compute the two-point correlation functions by 'step2_std_corr.m' to construct the 1D kernel ('data/corr.mat').
-
Compute three-point correlation functions for ND state-dependent kernel by 'step3_training_set.m' and 'step4_collect_training_set.m' ('data/dx_0.2_w_301.mat').
-
Train the model with 'train.py' ('MD_ND_4.mat', 'MD_ND_4_std.mat' for the model in the paper. 'MD_ND_4_lite.mat' and 'MD_ND_4_std_lite.mat' is the corresponding lite version due to the size limitation).
-
Simulate the standard GLE model and state-dependent GLE model by 'step5_std_GLE.m', 'step5_hx_GLE.m'. 'step5_hx_GLE_fast_conv.m' do the same thing as 'step5_hx_GLE.m' but evaluate convolution by fast convolution algorithm.
-
Compute correlation functions of all the reduced models by 'step6_GLE_corr.m' ('corr_GLE.mat', 'corr_ML_4D.mat').
-
The visualization is at the end of the 'main.m'.
The two figures show the probability distribution and free energy with two local minima.
The following two figures shows velocity correlation
Due to the storage limitation of GitHub, we only upload part of the data. The full data including MD trajectories (example 1), position and velocity of the resolved variables, and simulation data (example1) can be accessed from Globus with the link (https://app.globus.org/file-manager?origin_id=ec51ed95-bc26-44a4-a8a0-65b74d694c33&origin_path=%2F).
Python environment is given in the file 'conda-environment.txt'.
MATLAB version is 2022a.
The training is performed on v100s.