k-NN-based mapping of cells across representations to transfer labels, embeddings and expression values. Works for millions of cells, on CPU and GPU, across molecular modalities, between spatial and non-spatial data, for arbitrary query and reference datasets. Using faiss to compute k-NN graphs, CellMapper takes about 30 seconds to transfer cell type labels from 1.5M cells to 1.5M cells on a single RTX 4090 with 60 GB CPU memory.
Inspired by previous tools, including scanpy's ingest and the HNOCA-tools packages. Check out the 📚 docs to learn more, in particular our tutorials.
- 🧬 Transfer cell type labels and expression values from dissociated to spatial datasets.
↔️ Transfer embeddings between arbitrary query and reference datasets.- 📊 Compute presence scores for query datasets in large reference atlasses.
- 🗺️ Identify niches in spatial datasets by contextualizing latent spaces in spatial coordinates.
- 📈 Evaluate the results of transferring labels, embeddings and feature spaces using a variety of metrics.
The core idea of CellMapper
is to separate the method (k-NN graph with some kernel applied to get a mapping matrix) from the application (mapping across arbitrary representations), to be flexible and fast. The tool currently supports pynndescent, sklearn, faiss and rapids for neighborhood search, implements a variety of graph kernels, and is closely integrated with AnnData
objects.
You need to have 🐍 Python 3.10 or newer installed on your system. If you don't have Python installed, we recommend installing uv.
There are two alternative options to install cellmapper
:
-
🚀 Install the latest release from PyPI:
pip install cellmapper
-
🛠️ Install the latest development version:
pip install git+https://github.com/quadbio/cellmapper.git@main
This package assumes that you have query
and reference
AnnData objects, with a joint embedding computed and stored in .obsm
. While we implement some baseline approaches to compute joint embeddings (PCA and a fast reimplementation of CCA), we typically expect you to provide a pre-computed joint embedding from some task-specific representation learning tools, e.g. GimVI or ENVI for spatial mapping, GLUE, MIDAS and MOFA+ for modality translation, and scVI, scANVI and scArches for query-to-reference mapping - this is just a small selection!
With a joint embedding in .obsm["X_joint"]
at hand, the simplest way to use CellMapper
is as follows:
from cellmapper import CellMapper
cmap = CellMapper(query, reference).map(
use_rep="X_joint", obs_keys="celltype", obsm_keys="X_umap", layer_key="X"
)
This will transfer data from the reference to the query dataset, including celltype labels stored in reference.obs
, a UMAP embedding stored in reference.obsm
, and expression values stored in reference.X
.
There are many ways to customize this, e.g. use different ways to compute k-NN graphs and to turn them into mapping matrices, and we implement a few methods to evaluate whether your k-NN transfer was sucessful. The tool also implements a self-mapping
mode (only a query object, no reference), which is useful for spatial contextualization and data denoising. Check out the 📚 docs to learn more.
See the changelog.
If you found a bug, please use the issue tracker.
Please use our zenodo entry to cite this software.