Skip to content

This repository contains all code to support the paper: "On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation".

Notifications You must be signed in to change notification settings

nuldertien/PathBLIP-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation

This repository contains all code to support the paper:

"On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation"

[arXiv]

Overview

We developed a vision-language model for the pathology domain of melanocytic lesions. The model was trained and evaluated using a dataset of 19,636 melanocytic lesion cases, consisting of one or more whole slide images (WSIs) and a pathology report per case. In total, the dataset comprised of 42,433 H&E-stained WSIs and 2,132,008 words. We built upon the BLIP-2 framework using BioGPT as base language model and HIPT for WSI feature extraction. To evaluate the model, we assessed the cross-modal retrieval performance and conducted a reader study to score the quality of the generated reports.

Model Parameters

We provide checkpoints for both the retrieval and report generation stages. All models are available from the corresponding HuggingFace repository.

🔍 Stage 1: Retrieval Model

The retrieval model is trained with 16 queries and is used for the retrieval results presented in the paper.

📝 Stage 2: Report Generation Models

The final report generation models build upon the Stage 1 checkpoint trained with 64 queries and are used for the reader study results.

Final Stage 2 models:

Citing

If you found our work useful in your research, please consider citing our paper:

@article{lucassen2025importance,
  title={On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation}, 
  author={Lucassen, Ruben T and van de Luijtgaarden, Tijn and Moonemans, Sander P J and Breimer, Gerben E and Blokx, Willeke A M and Veta, Mitko},
  year={2025},
  eprint={2502.19285},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2502.19285}
}

About

This repository contains all code to support the paper: "On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •