Skip to content

schwallergroup/gosybench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

GOSyBench

tests DOI:10.18653/v1/2024.langmol-1.9 PyPI PyPI - Python Version Documentation Status Code style: black Cookiecutter template from @SchwallerGroup Learn more @SchwallerGroup

A benchmark for Knowledge Graph Extraction from Total Synthesis documents.

πŸ’ͺ Getting Started

from gosybench.basetypes import STree
from gosybench.evaluate import GOSyBench
from gosybench.metrics import GraphEval, TreeMetrics


def test_method(path: str) -> STree:
    # Define your method for KGE here.
    return STree(products=[], graph=nx.DiGraph())

gosybench = GOSyBench(
    project="my-eval",
    describe=TreeMetrics(),
    metrics=GraphEval(),
)

# Evaluate
gosybench.evaluate(test_method)

πŸš€ Installation

The most recent code and data can be installed directly from GitHub with:

$ pip install git+https://github.com/schwallergroup/gosybench.git

Optionally, you can install Jasyntho, our package for KGE.

$ pip install "git+https://github.com/schwallergroup/gosybench.git#egg=gosybench[jasyntho]"

πŸš€ Advanced Usage

See advanced usage.

🌱 Jasyntho

Jasyntho is a package for Knowledge Graph Extraction of Total Syntheses. It relies on LLMs for some core functionalities.

Make sure to create an .env file with the API keys of the LLM providers you want to use:

OPENAI_API_KEY=sk-... 
ANTHROPIC_API_KEY=sk-ant-...

Download the paper you want to extract in a directory like this

jacs.9b12546
    β”œβ”€β”€ doi.txt
    β”œβ”€β”€ paper.pdf
    └── si_0.pdf

paper.pdf is the main article, and si_0.pdf is the Supplementary Information of that article.

Then, use Jasyntho like:

from jasyntho import SynthTree

tree = SynthTree.from_dir(path)
tree.rxn_extract = ExtractReaction(llm=model)

tree.raw_prods = await tree.async_extract_rss(
    mode=method, si_select=si_select
)
tree.products = [p for p in tree.raw_prods if not p.isempty()]
tree.full_g = tree.get_full_graph(tree.products)

βœ… Citation

Andres M Bran, Zlatko Jončev, and Philippe Schwaller. 2024. Knowledge Graph Extraction from Total Synthesis Documents. In Proceedings of the 1st Workshop on Language + Molecules (L+M 2024), pages 74–84, Bangkok, Thailand. Association for Computational Linguistics.

@inproceedings{m-bran-etal-2024-knowledge,
    title = "Knowledge Graph Extraction from Total Synthesis Documents",
    author = "M Bran, Andres  and  Jon{\v{c}}ev, Zlatko  and Schwaller, Philippe",
    booktitle = "Proceedings of the 1st Workshop on Language + Molecules (L+M 2024)",
    year = "2024",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.langmol-1.9",
    doi = "10.18653/v1/2024.langmol-1.9",
    pages = "74--84",
 }

πŸ› οΈ For Developers

See developer instructions

πŸ‘ Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

πŸ‘‹ Attribution

βš–οΈ License

The code in this package is licensed under the MIT License.

πŸͺ Cookiecutter

This package was created with @audreyfeldroy's cookiecutter package using @cthoyt's cookiecutter-snekpack template.

πŸ› οΈ For Developers

See developer instructions

The final section of the README is for if you want to get involved by making a code contribution.

Development Installation

To install in development mode, use the following:

$ git clone git+https://github.com/schwallergroup/gosybench.git
$ cd gosybench
$ pip install -e .

πŸ₯Ό Testing

After cloning the repository and installing tox with pip install tox, the unit tests in the tests/ folder can be run reproducibly with:

$ tox

Additionally, these tests are automatically re-run with each commit in a GitHub Action.

πŸ“– Building the Documentation

The documentation can be built locally using the following:

$ git clone git+https://github.com/schwallergroup/gosybench.git
$ cd gosybench
$ tox -e docs
$ open docs/build/html/index.html

The documentation automatically installs the package as well as the docs extra specified in the setup.cfg. sphinx plugins like texext can be added there. Additionally, they need to be added to the extensions list in docs/source/conf.py.

πŸ“¦ Making a Release

After installing the package in development mode and installing tox with pip install tox, the commands for making a new release are contained within the finish environment in tox.ini. Run the following from the shell:

$ tox -e finish

This script does the following:

  1. Uses Bump2Version to switch the version number in the setup.cfg, src/gosybench/version.py, and docs/source/conf.py to not have the -dev suffix
  2. Packages the code in both a tar archive and a wheel using build
  3. Uploads to PyPI using twine. Be sure to have a .pypirc file configured to avoid the need for manual input at this step
  4. Push to GitHub. You'll need to make a release going with the commit where the version was bumped.
  5. Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can use tox -e bumpversion -- minor after.

About

A benchmark for Knowledge Graph Extraction from Total Synthesis documents. See the paper: https://aclanthology.org/2024.langmol-1.9/

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •