A selection of tools for analysing ASE databases generated by G-SchNet, complete with a nice CLI for ease of use.
- Parsing of 'bonding' features from databases (number of elements, bonds, rings, aromaticity, etc.) and saving as new ASE databases with full metadata.
- Fitting of PCA models for dimensionality reduction of structural (SOAP/MBTR) and bonding (see above) descriptors.
- Transformation and export of principal components from fitted models, for use in chemical space plotting.
- Subsampling of databases to match the molecular weight distruibution of a training database.
Since the internals of this package depend on Open Babel, it's probably easiest to install through pip
within a conda
environment, as the openbabel
package is much easier to install through conda
.
While in a conda
environment with openbabel
already installed, clone the repository and then run
pip install --no-deps ./GSchNetTools
A better installation solution is coming, I just haven't got around to uploading to conda-forge yet.
Once installed, the command line program gstools
will be available. This has a full-featured help system that can be accessed by calling gstools help {subcommand}
, but to get you started:
gstools parse [db] {options}
parses a database for bonding features.gstools fit_pca [dbs] {options}
fits PCA models on the molecules in one or more databases.gstools transform_pca [pca_models] [dbs] {options}
uses the models from the previous command to fetch and save PCs for the descriptors of the molecules in one or more databases.gstools sample [train_db] [dbs_to_sample] {options}
subsamples databases according to the molecular weight distribution intrain_db
.