Caption molecules and materials for pretraining for neural networks.
ChemCaption is a tool designed to generate prompts for molecular features to train neural networks.
Here is a quick example of one of the featurizers designed to count the number of elements in a molecule.
from chemcaption.presets import ORGANIC
from chemcaption.molecules import SMILESMolecule
from chemcaption.featurize.composition import ElementCountFeaturizer
# Molecule we want to featurize
molecule = SMILESMolecule("C1(Br)=CC=CC=C1Br")
# We can eather specify the symbol or the full name
el_count_name = ElementCountFeaturizer(['carbon', 'hydrogen', 'oxygen', 'bromine'])
# Featurize the molecule
prompt = el_count_name.text_featurize(molecule=molecule)
The generate prompt has the following QA pair.
Question: What are the atom counts of Carbon, Hydrogen, Hidrogen, and Bromine of the molecule with SMILES Brc1ccccc1Br?
Answer: 6, 4, 0, and 2
For more details and all other available featurizers please visit the documentation.
The most recent release can be installed from PyPI with:
pip install chemcaption
The most recent code and data can be installed directly from GitHub with:
pip install git+https://github.com/lamalab-org/chem-caption
Some of the ChemCaption featurizers are dependent on morfeus and might require additional dependencies to be installed. You can see all the optional dependencies for morfeus-ml here
Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.
The code in this package is licensed under the MIT License.
This package was created with @audreyfeldroy's cookiecutter package using @cthoyt's cookiecutter-snekpack template.
See developer instructions
The final section of the README is for if you want to get involved by making a code contribution.
To install in development mode, use the following:
$ git clone git+https://github.com/lamalab-org/chem-caption
$ cd chem-caption
$ pip install -e .
After cloning the repository and installing nox
with pip install nox
, the unit tests in the tests/
folder can be
run reproducibly with:
$ nox
Additionally, these tests are automatically re-run with each commit in a GitHub Action.
The documentation can be built locally using the following:
$ git clone git+https://github.com/lamalab-org/chem-caption
$ cd chem-caption
$ nox --session docs
$ open docs/build/html/index.html
The documentation automatically installs the package as well as the docs
extra specified in the setup.cfg
. sphinx
plugins
like texext
can be added there. Additionally, they need to be added to the
extensions
list in docs/source/conf.py
.
After installing the package in development mode and installing
nox
with pip install nox
, the commands for making a new release are contained within the finish
environment
in noxfile.py
. Run the following from the shell:
$ nox --session finish
This script does the following:
- Uses Bump2Version to switch the version number in the
setup.cfg
,src/chemcaption/version.py
, anddocs/source/conf.py
to not have the-dev
suffix - Packages the code in both a tar archive and a wheel using
build
- Uploads to PyPI using
twine
. Be sure to have a.pypirc
file configured to avoid the need for manual input at this step - Push to GitHub. You'll need to make a release going with the commit where the version was bumped.
- Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can
use
nox -e bumpversion -- minor
after.