Skip to content

lamalab-org/chem-caption

Repository files navigation

ChemCaption

Tests PyPI - License Cookiecutter template from @cthoyt Code style: black Contributor Covenant

Caption molecules and materials for pretraining for neural networks.

💪 Getting Started

ChemCaption is a tool designed to generate prompts for molecular features to train neural networks.

Here is a quick example of one of the featurizers designed to count the number of elements in a molecule.

from chemcaption.presets import ORGANIC
from chemcaption.molecules import SMILESMolecule
from chemcaption.featurize.composition import ElementCountFeaturizer

# Molecule we want to featurize
molecule = SMILESMolecule("C1(Br)=CC=CC=C1Br")

# We can eather specify the symbol or the full name
el_count_name = ElementCountFeaturizer(['carbon', 'hydrogen', 'oxygen', 'bromine'])

# Featurize the molecule
prompt = el_count_name.text_featurize(molecule=molecule)

The generate prompt has the following QA pair.

Question: What are the atom counts of Carbon, Hydrogen, Hidrogen, and Bromine of the molecule with SMILES Brc1ccccc1Br?
Answer: 6, 4, 0, and 2

For more details and all other available featurizers please visit the documentation.

🚀 Installation

The most recent release can be installed from PyPI with:

pip install chemcaption

The most recent code and data can be installed directly from GitHub with:

pip install git+https://github.com/lamalab-org/chem-caption

Some of the ChemCaption featurizers are dependent on morfeus and might require additional dependencies to be installed. You can see all the optional dependencies for morfeus-ml here

👐 Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

👋 Attribution

⚖️ License

The code in this package is licensed under the MIT License.

🍪 Cookiecutter

This package was created with @audreyfeldroy's cookiecutter package using @cthoyt's cookiecutter-snekpack template.

🛠️ For Developers

See developer instructions

The final section of the README is for if you want to get involved by making a code contribution.

Development Installation

To install in development mode, use the following:

$ git clone git+https://github.com/lamalab-org/chem-caption
$ cd chem-caption
$ pip install -e .

🥼 Testing

After cloning the repository and installing nox with pip install nox, the unit tests in the tests/ folder can be run reproducibly with:

$ nox

Additionally, these tests are automatically re-run with each commit in a GitHub Action.

📖 Building the Documentation

The documentation can be built locally using the following:

$ git clone git+https://github.com/lamalab-org/chem-caption
$ cd chem-caption
$ nox --session docs
$ open docs/build/html/index.html

The documentation automatically installs the package as well as the docs extra specified in the setup.cfg. sphinx plugins like texext can be added there. Additionally, they need to be added to the extensions list in docs/source/conf.py.

📦 Making a Release

After installing the package in development mode and installing nox with pip install nox, the commands for making a new release are contained within the finish environment in noxfile.py. Run the following from the shell:

$ nox --session finish

This script does the following:

  1. Uses Bump2Version to switch the version number in the setup.cfg, src/chemcaption/version.py, and docs/source/conf.py to not have the -dev suffix
  2. Packages the code in both a tar archive and a wheel using build
  3. Uploads to PyPI using twine. Be sure to have a .pypirc file configured to avoid the need for manual input at this step
  4. Push to GitHub. You'll need to make a release going with the commit where the version was bumped.
  5. Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can use nox -e bumpversion -- minor after.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages