Skip to content

DS4SD/MarkushGrapher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MarkushGrapher

Huggingface arXiv

This is the repository for MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures.

Description of the image

Citation

If you find this repository useful, please consider citing:

@article{morin2025markushgrapherjointvisualtextual,
	title        = {{MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures}},
	author       = {Lucas Morin and Valéry Weber and Ahmed Nassar and Gerhard Ingmar Meijer and Luc Van Gool and Yawei Li and Peter Staar},
	year         = 2025,
	journal      = {arXiv preprint arXiv:2503.16096},
	url          = {https://arxiv.org/abs/2503.16096},
	eprint       = {2503.16096},
	archiveprefix = {arXiv},
	primaryclass = {cs.CV}
}

Installation

  1. Create a virtual environment.
python3.10 -m venv markushgrapher-env
source markushgrapher-env/bin/activate
  1. Install MarkushGrapher.
pip install -e .
  1. Install transformers. This fork contains the code for the MarkushGrapher architecture. It was written starting from a copy of the UDOP architecture.
git clone https://github.com/lucas-morin/transformers.git ./external/transformers
pip install -e ./external/transformers
  1. Install MolScribe. This fork contains minor fixes for compatibility with albumentations.
git clone https://github.com/lucas-morin/MolScribe.git ./external/MolScribe
pip install -e ./external/MolScribe --no-deps

Model

Download the MarkushGrapher model from HuggingFace.

huggingface-cli download ds4sd/MarkushGrapher --local-dir ./tmp/ --repo-type model && cp -r ./tmp/models . && rm -r ./tmp/

Download the MolScribe model from HuggingFace.

wget https://huggingface.co/yujieq/MolScribe/resolve/main/swin_base_char_aux_1m680k.pth -P ./external/MolScribe/ckpts/ 

Datasets

Download the datasets from HuggingFace.

huggingface-cli download ds4sd/MarkushGrapher-Datasets --local-dir ./data/hf --repo-type dataset

For training, we use:

  1. MarkushGrapher-Synthetic-Training (Synthetic dataset)

For benchmarking, we use:

  1. M2S (Multi-modal real-world dataset)
  2. USPTO-Markush (Image-only real-world dataset)
  3. MarkushGrapher-Synthetic (Synthetic dataset)

The synthetic datasets are generated using MarkushGenerator.

Inference

Note: MarkushGrapher is currently not able to process images without OCR annotations. The model relies on OCR bounding boxes and text provided as input.

  1. Select a dataset by setting the dataset_path parameter in MarkushGrapher/config/dataset_predict.yaml.

  2. Run MarkushGrapher.

python3.10 -m markushgrapher.eval config/predict.yaml
  1. Visualize predictions in: MarkushGrapher/data/visualization/prediction/.

Training

  1. Select the training configuration in MarkushGrapher/config/train.yaml and MarkushGrapher/config/datasets/datasets.yaml.

  2. Run training script.

PYTHONUNBUFFERED=1 CUDA_VISIBLE_DEVICES=0 python3.10 -m markushgrapher.train config/train.yaml

Acknowledgments

MarkushGrapher uses the code of UDOP and the MolScribe model.

MarkushGrapher was trained from the pre-trained UDOP weights available on HuggingFace (checkpoint: udop-unimodel-large-512-300k-steps.zip).

About

[CVPR 25] MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages