On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation

This repository contains all code to support the paper:

"On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation"

[arXiv]

We developed a vision-language model for the pathology domain of melanocytic lesions. The model was trained and evaluated using a dataset of 19,636 melanocytic lesion cases, consisting of one or more whole slide images (WSIs) and a pathology report per case. In total, the dataset comprised of 42,433 H&E-stained WSIs and 2,132,008 words. We built upon the BLIP-2 framework using BioGPT as base language model and HIPT for WSI feature extraction. To evaluate the model, we assessed the cross-modal retrieval performance and conducted a reader study to score the quality of the generated reports.

Model Parameters

We provide checkpoints for both the retrieval and report generation stages. All models are available from the corresponding HuggingFace repository.

🔍 Stage 1: Retrieval Model

The retrieval model is trained with 16 queries and is used for the retrieval results presented in the paper.

Retrieval Model

📝 Stage 2: Report Generation Models

The final report generation models build upon the Stage 1 checkpoint trained with 64 queries and are used for the reader study results.

Base Checkpoint

Final Stage 2 models:

Citing

If you found our work useful in your research, please consider citing our paper:

@article{lucassen2025importance,
  title={On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation}, 
  author={Lucassen, Ruben T and van de Luijtgaarden, Tijn and Moonemans, Sander P J and Breimer, Gerben E and Blokx, Willeke A M and Veta, Mitko},
  year={2025},
  eprint={2502.19285},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2502.19285}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
lavis		lavis
pathology		pathology
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation

Model Parameters

🔍 Stage 1: Retrieval Model

📝 Stage 2: Report Generation Models

Citing

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

nuldertien/PathBLIP-2

Folders and files

Latest commit

History

Repository files navigation

On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation

Model Parameters

🔍 Stage 1: Retrieval Model

📝 Stage 2: Report Generation Models

Citing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages