CytoCommunity+

Overview

To enhance CytoCommunity (https://github.com/huBioinfo/CytoCommunity), we present CytoCommunity+, a unified weakly-supervised framework for identifying and comparing tissue cellular neighborhoods (TCNs or CNs) across large-scale spatial omics samples with single or multiple biological conditions.

Inspired by histopathology workflows, CytoCommunity+ first hierarchically partitions the large single-cell spatial map into small patches, performs graph construction and weakly supervised TCN learning for each patch, and finally merges results through KNN-based TCN reassignment at segmentation boundaries to ensure TCN spatial continuity. This strategy divides the original sample into patches for TCN learning, achieving memory efficiency (typical 24G graphics memory is enough) and also increased sample throughput. These optimizations significantly enhance the robustness of TCNs and cross-sample alignment performance.

Furthermore, to make CytoCommunity+ a unified framework that is also applicable for single-condition spatial omics datasets, pseudo-samples with artificial labels are generated, enabling automatic TCN alignment across real samples via contrastive learning.

In summary, the advantages of CytoCommunity+ include:

(1) Using significantly less memory for large-scale spatial omics samples with millions of cells.

(2) A unified weakly-supervised model applicable for both multi-condition and single-condition datasets.

(3) High TCN alignment performance makes it well-suited for comparative analysis in large cohort studies.

Installation

Hardware requirement

(Graphics) Memory: 24G

Storage: 10GB or more

Software requirement

Conda version: 22.9.0

Python version: 3.10.6

R version: >= 4.0 suggested

Clone this repository and cd into it as below.

git clone https://github.com/LiukangWu/CytoCommunity-plus.git
cd CytoCommunity-plus

Set up virtual environment for Windows

Create a new conda environment using the environment_windows_cpu.yml file (CPU version) and activate it:
```
conda env create -f environment_windows_cpu.yml
conda activate CytoCommunity_cpu
```
Or create a new conda environment using the environment_windows_gpu.yml file (GPU version) and activate it:
```
conda env create -f environment_windows_gpu.yml
conda activate CytoCommunity_gpu
```
Install the diceR package (R has already been included in the requirements) with the following command:
```
R.exe
> install.packages("diceR")
```

Set up virtual environment for Linux

Create a new conda environment using the environment_linux_cpu.yml file (CPU version) and activate it:
```
conda env create -f environment_linux_cpu.yml
conda activate CytoCommunity_cpu
```
Or create a new conda environment using the environment_linux_gpu.yml file (GPU version) and activate it:
```
conda env create -f environment_linux_gpu.yml
conda activate CytoCommunity_gpu
```

Install R and the diceR package:

conda install R
R
> install.packages("diceR")

The whole installation should take around 20 minutes.

Usage

Prepare input data

The input data to CytoCommunity+ includes four types of files (refer to "CODEX_SpleenDataset/"):

(1) An image (sample) name list file, named as "ImageNameList.txt".

(2) A cell type label file for each image (sample), named as "[image name]_CellTypeLabel.txt". Note that [image_name] should be consistent with your customized image names listed in the "ImageNameList.txt". This file lists cell type names of all cells in an image (sample).

(3) A cell spatial coordinate file for each image (sample), named as "[image name]_Coordinates.txt". Note that [image_name] should be consistent with your customized image names listed in the "ImageNameList.txt". This file lists cell coordinates (tab-delimited x/y) of all cells in an image (sample). The cell orders should be exactly the same with "[image name]_CellTypeLabel.txt".

(4) (Optional, for multi-condition datasets only) A graph label file for each image (sample), named as "[image name]_GraphLabel.txt". Note that [image_name] should be consistent with your customized image names listed in the "ImageNameList.txt". This file contains an integer label (e.g., "0", "1", "2", etc) that indicates the condition of each image (sample) in the weakly-supervised learning framework. !!Must begin with 0.

Ready to run

Step 0: (Optional, for single-condition datasets only) Generate pseudo-spatial maps by shuffling cell types in real spatial maps.

This step generates a folder "Step0_Output" containing pseudo-spatial maps created by randomly shuffling cell type labels while maintaining original spatial coordinates. Each pseudo-sample will have corresponding "pseudo" suffixed files alongside the original samples.

python Step0_GeneratePseudoMaps.py

Hyperparameters

InputFolderName: The folder name of your original input dataset.

Step 1: Split large spatial maps into small manageable patches.

This step generates a folder "Step1_Output" containing spatially splitted patches for each original image (sample) along with their corresponding coordinate files, cell type label files, and graph label files, as well as a global "All_Boundary.txt" file that records all splitting boundaries and an "ImagePatchNameList.txt" file that catalogs all generated patches. The recursive splitting process ensures large tissue images (samples) are divided into smaller, more manageable patches while maintaining all original cellular information and spatial relationships.

python Step1_SplitSpatialMaps.py

Hyperparameters

CellPatchNum: Maximum cell count threshold (default=50,000) triggering recursive splitting.
MinCellCount_Patch: Minimum cell count (default=20) required to keep a generated patch.
InputFolderName: Path to input dataset folder (default="./Step0_Output/"). !!Change it to the original input directory for multi-condition datasets.

Step 2: Construct KNN-based cellular spatial graghs of all patches and convert them to the standard format required by Torch.

This step generates a folder "Step2_Output" including constructed cellular spatial graphs of all patches.

python Step2_ConstructCellularSpatialGraphs.py

Hyperparameters

KNN_K: The K value (default=50; To identify ≥10 TCNs, a value of 20 is suggested) used in the construction of the K nearest neighbor graph (cellular spatial graph) for each patch.

Step 3: Perform soft TCN assignment learning in a weakly-supervised fashion.

This step generates a folder "Step3_Output" containing results from multiple independent runs of the weakly-supervised TCN learning process. Each run folder includes training loss logs and output matrices (cluster assignment matrix, cluster adjacency matrix, and node mask) for all patches. The model combines graph partitioning (MinCut loss) with graph classification (cross-entropy loss) in an end-to-end training framework.

python Step3_TCN-Learning_WeaklySupervised.py

Hyperparameters

Num_TCN: Maximum number of TCNs (default=4) to identify.
Num_Run: Number of independent training runs (default=10).
Num_Epoch: Training epochs per run (default=400).
Num_Class: Number of tissue image (sample) conditions (default=2).
Embedding_Dimension: Embedding dimension (default=128).
MiniBatchSize: This value is commonly set to be powers of 2 due to efficiency consideration (default=2).
LearningRate: Optimizer learning rate (default=0.001).
beta: A weight parameter to balance the MinCut loss used for graph partitioning and the cross-entropy loss used for graph classification. The default value is set to [0.9] due to emphasis on graph partitioning.

Step 4: Perform TCN assignment ensemble.

The results of this step are saved under the "Step4_Output/ImageCollection/" directory. A "TCNLabel_MajorityVoting.csv" file will be generated for each patch.

Rscript Step4_TCN-Ensemble.R

Hyperparameters

NONE

Step 5: Refine TCN membership at segmentation boundaries and generate final visualizations.

This step generates a folder "Step5_Output" containing four subfolders with comprehensive results: "TCN_Plot" storing spatial maps colored by identified TCNs (in PNG and PDF formats), "CellRefinement_Plot" showing boundary refinement results, "ResultTable_File" containing detailed TCN identification results in CSV format, and "CellType_Plot" storing spatial maps colored by original cell type annotations.

python Step5_BoundaryRefinement.py

Hyperparameters

KNN_K: Number of nearest neighboring cells (default=50) used for boundary refinement.
Num_TCN: Maximum number of TCNs (default=4) for consistent coloring.
Smoothing_range: Spatial range (default=50μm) for boundary refinement.
InputFolderName: Path to input dataset folder (default="./Step0_Output/"). !!Change it to the original input directory for multi-condition datasets.

Demo

Applied to healthy mouse spleen spatial proteomics data, CytoCommunity+ demonstrates performance comparable to CytoCommunity while enabling automatic TCN alignment across samples (i.e., colors are matched) with much lower memory consumption. Note that most deep learning-based unsupervised methods like CytoCommunity (unsupervised version) process images (samples) individually and thus TCNs are not aligned across samples (i.e., colors are NOT matched), which hinders comparative analysis.

Update Log

Maintainers

Liukang Wu (yetong@stu.xidian.edu.cn)

Yafei Xu (22031212416@stu.xidian.edu.cn)

Yuxuan Hu (huyuxuan@xidian.edu.cn)

Citation

Yuxuan Hu, Jiazhen Rong, Yafei Xu, Runzhi Xie, Jacqueline Peng, Lin Gao, Kai Tan. Unsupervised and supervised discovery of tissue cellular neighborhoods from cell phenotypes. Nature Methods, 2024, 21:267–278 https://doi.org/10.1038/s41592-023-02124-2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CytoCommunity+

Contents

Overview

Installation

Hardware requirement

Software requirement

Set up virtual environment for Windows

Set up virtual environment for Linux

Usage

Prepare input data

Ready to run

Step 0: (Optional, for single-condition datasets only) Generate pseudo-spatial maps by shuffling cell types in real spatial maps.

Step 1: Split large spatial maps into small manageable patches.

Step 2: Construct KNN-based cellular spatial graghs of all patches and convert them to the standard format required by Torch.

Step 3: Perform soft TCN assignment learning in a weakly-supervised fashion.

Step 4: Perform TCN assignment ensemble.

Step 5: Refine TCN membership at segmentation boundaries and generate final visualizations.

Demo

Update Log

Maintainers

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
CODEX_SpleenDataset		CODEX_SpleenDataset
support		support
README.md		README.md
Step0_GeneratePseudoSpatialMaps.py		Step0_GeneratePseudoSpatialMaps.py
Step1_SplitSpatialMaps.py		Step1_SplitSpatialMaps.py
Step2_ConstructCellularSpatialGraphs.py		Step2_ConstructCellularSpatialGraphs.py
Step3_TCN-Learning_WeaklySupervised.py		Step3_TCN-Learning_WeaklySupervised.py
Step4_TCN-Ensemble.R		Step4_TCN-Ensemble.R
Step5_TCN-BoundaryRefinement-Visualization.py		Step5_TCN-BoundaryRefinement-Visualization.py
environment_linux_cpu.yml		environment_linux_cpu.yml
environment_linux_gpu.yml		environment_linux_gpu.yml
environment_windows_cpu.yml		environment_windows_cpu.yml
environment_windows_gpu.yml		environment_windows_gpu.yml

LiukangWu/CytoCommunity-plus

Folders and files

Latest commit

History

Repository files navigation

CytoCommunity+

Contents

Overview

Installation

Hardware requirement

Software requirement

Set up virtual environment for Windows

Set up virtual environment for Linux

Usage

Prepare input data

Ready to run

Step 0: (Optional, for single-condition datasets only) Generate pseudo-spatial maps by shuffling cell types in real spatial maps.

Step 1: Split large spatial maps into small manageable patches.

Step 2: Construct KNN-based cellular spatial graghs of all patches and convert them to the standard format required by Torch.

Step 3: Perform soft TCN assignment learning in a weakly-supervised fashion.

Step 4: Perform TCN assignment ensemble.

Step 5: Refine TCN membership at segmentation boundaries and generate final visualizations.

Demo

Update Log

Maintainers

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages