A benchmark for Knowledge Graph Extraction from Total Synthesis documents.
from gosybench.basetypes import STree
from gosybench.evaluate import GOSyBench
from gosybench.metrics import GraphEval, TreeMetrics
def test_method(path: str) -> STree:
# Define your method for KGE here.
return STree(products=[], graph=nx.DiGraph())
gosybench = GOSyBench(
project="my-eval",
describe=TreeMetrics(),
metrics=GraphEval(),
)
# Evaluate
gosybench.evaluate(test_method)
The most recent code and data can be installed directly from GitHub with:
$ pip install git+https://github.com/schwallergroup/gosybench.git
Optionally, you can install Jasyntho, our package for KGE.
$ pip install "git+https://github.com/schwallergroup/gosybench.git#egg=gosybench[jasyntho]"
See advanced usage.
Jasyntho is a package for Knowledge Graph Extraction of Total Syntheses. It relies on LLMs for some core functionalities.
Make sure to create an .env
file with the API keys of the LLM providers you want to use:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
Download the paper you want to extract in a directory like this
jacs.9b12546
βββ doi.txt
βββ paper.pdf
βββ si_0.pdf
paper.pdf
is the main article, and si_0.pdf
is the Supplementary Information of that article.
Then, use Jasyntho like:
from jasyntho import SynthTree
tree = SynthTree.from_dir(path)
tree.rxn_extract = ExtractReaction(llm=model)
tree.raw_prods = await tree.async_extract_rss(
mode=method, si_select=si_select
)
tree.products = [p for p in tree.raw_prods if not p.isempty()]
tree.full_g = tree.get_full_graph(tree.products)
Andres M Bran, Zlatko JonΔev, and Philippe Schwaller. 2024. Knowledge Graph Extraction from Total Synthesis Documents. In Proceedings of the 1st Workshop on Language + Molecules (L+M 2024), pages 74β84, Bangkok, Thailand. Association for Computational Linguistics.
@inproceedings{m-bran-etal-2024-knowledge,
title = "Knowledge Graph Extraction from Total Synthesis Documents",
author = "M Bran, Andres and Jon{\v{c}}ev, Zlatko and Schwaller, Philippe",
booktitle = "Proceedings of the 1st Workshop on Language + Molecules (L+M 2024)",
year = "2024",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.langmol-1.9",
doi = "10.18653/v1/2024.langmol-1.9",
pages = "74--84",
}
See developer instructions
Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.
The code in this package is licensed under the MIT License.
This package was created with @audreyfeldroy's cookiecutter package using @cthoyt's cookiecutter-snekpack template.
See developer instructions
The final section of the README is for if you want to get involved by making a code contribution.
To install in development mode, use the following:
$ git clone git+https://github.com/schwallergroup/gosybench.git
$ cd gosybench
$ pip install -e .
After cloning the repository and installing tox
with pip install tox
, the unit tests in the tests/
folder can be
run reproducibly with:
$ tox
Additionally, these tests are automatically re-run with each commit in a GitHub Action.
The documentation can be built locally using the following:
$ git clone git+https://github.com/schwallergroup/gosybench.git
$ cd gosybench
$ tox -e docs
$ open docs/build/html/index.html
The documentation automatically installs the package as well as the docs
extra specified in the setup.cfg
. sphinx
plugins
like texext
can be added there. Additionally, they need to be added to the
extensions
list in docs/source/conf.py
.
After installing the package in development mode and installing
tox
with pip install tox
, the commands for making a new release are contained within the finish
environment
in tox.ini
. Run the following from the shell:
$ tox -e finish
This script does the following:
- Uses Bump2Version to switch the version number in the
setup.cfg
,src/gosybench/version.py
, anddocs/source/conf.py
to not have the-dev
suffix - Packages the code in both a tar archive and a wheel using
build
- Uploads to PyPI using
twine
. Be sure to have a.pypirc
file configured to avoid the need for manual input at this step - Push to GitHub. You'll need to make a release going with the commit where the version was bumped.
- Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can
use
tox -e bumpversion -- minor
after.