Fine-Tuning the CellViT Model with STHELAR, a Xenium-Based Spatial Transcriptomics Dataset

This repository is adapted from the original CellViT repository with the following reference: Hörst, F. et al. CellViT: Vision Transformers for precise cell segmentation and classification. Medical Image Analysis 94, 103143 (2024). doi: 10.1016/j.media.2024.103143

This repository focuses on fine-tuning the CellViT model using a dataset built using publicly available Spatial Transcriptomics (ST) data from the 10x Genomics Xenium platform (STHELAR dataset). The dataset construction process, including H&E image patch extraction, nucleus segmentation, and cell-type classification based on RNA information, is detailed in the following github repository: STHELAR github.

STHELAR dataset includes in particular:

H&E image patches
Corresponding nucleus segmentation masks
Cell-type annotations derived from RNA information
Tissue provenance metadata

Data availability:

The full dataset is available online at doi: 10.6019/S-BIAD2146.
A part of this dataset containing H&E image patches and their corresponding masks, is available in Parquet format on Hugging Face for convenient access and use:
- at 40x resolution: doi: 10.57967/hf/6008
- at 20x resolution: doi: 10.57967/hf/6009

The goal is to fine-tune the CellViT model using a large-scale dataset with more precise cell type classes.

A detailed description of the pipeline, methods, and results can be found in the following article: Giraud-Sauveur, F. et al. STHELAR, a multi-tissue dataset linking spatial transcriptomics and histology for cell type annotation. bioRxiv (2025) doi:10.1101/2025.07.11.664123.

CellViT Model and Codebase Modifications

All original information regarding the CellViT model, including its pre-training and authorship, is maintained in the file README_CellViT.md.

Several modifications have been made to the original codebase. For instance:

The data format has been adapted to efficiently handle large-scale datasets.
Dataset selection has been made more flexible to allow fine-tuning on different label levels and dataset subsets.
Some code to extract more informations like cell features for instance.

New Scripts and Notebooks

The following files have been added to support our dataset preparation and analysis:

In cell_segmentation/datasets:
- convert_into_zip.py: Converts the dataset into ZIP format.
- make_folds_pannuke.py: Creates data splits based on slide selection and patch-level metrics.
- analyse_ds_patches.ipynb: Analyzes the composition and distribution of patches in the dataset.
- get_weights_dataset.ipynb: Computes weights for losses and dataset balancing.
- calculate_mean_std_train.py: Calculates the mean and standard deviation of RGB channels in the training set.
- calculate_loss_extrema.py and analyze_loss_extrema_training.ipynb: Estimate and analyze the range (extrema) and the random case of loss values during training.
- macenko_normationzation(_v2).py: Perform Macenko normalization on the dataset.
In cell_segmentation/utils:
- HED_augmentation.py: Specific augmentation for H&E slides.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
base_ml		base_ml
cell_segmentation		cell_segmentation
configs		configs
datamodel		datamodel
docs		docs
example		example
jeanzay		jeanzay
logs_paper/PanNuke		logs_paper/PanNuke
models		models
preprocessing		preprocessing
reports		reports
ruche		ruche
shell_commands		shell_commands
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CellViT.md		README_CellViT.md
environment.yml		environment.yml
makefile		makefile
optional_dependencies.txt		optional_dependencies.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fine-Tuning the CellViT Model with STHELAR, a Xenium-Based Spatial Transcriptomics Dataset

CellViT Model and Codebase Modifications

New Scripts and Notebooks

About

Uh oh!

Releases 1

Packages

Languages

License

MICS-Lab/CellViT_for_STHELAR

Folders and files

Latest commit

History

Repository files navigation

Fine-Tuning the CellViT Model with STHELAR, a Xenium-Based Spatial Transcriptomics Dataset

CellViT Model and Codebase Modifications

New Scripts and Notebooks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages