PEtracer 2025

This repository contains the code to reproduce all analyses and figures from the manuscript "High-resolution spatial mapping of cell state and lineage dynamics in vivo with PEtracer".

Setup

Python environment

conda env create --file environment.yml
conda activate petracer
ipython kernel install --user --name petracer

The environment.lock.yml file can be used to recreate the environment with the exact package versions used in the paper.

Image processing environment

Image processing was performed on a linux HPC cluster with the following software installed:

Data availability

Processed data is available on Figshare
Single-cell RNA-seq data is available on GEO
All other sequencing data is available on SRA

Simulation

The simulation directory contains code for simulating lineage tracing data with a variety of parameters. To run simulations:

python simulation/simulate.py

To generate simulation plots:

python simulation/plot.py

Prime editing strategy selection

The insertvariants and RTT_optimization directories contain code for processing and analyzing amplicon sequencing data used to select edit sites and optimize editing strategies for the PEtracer system.

Data processing

Sequencing data was processed on a Linux HPC cluster with SLURM, Python 3.11, and CRISPResso 2.2.7 installed. Processed files can be generated by running

./crispresso.sh
Rscript ../../scripts/make_CRISPResso_summary.R ./ CRISPResso_summary.txt

after downloading the fastq files listed in manifest.txt from SRA and placing them in the strategy_selection/insertvariants/fastq directory.

Analysis

plot.ipynb - generate strategy selection plots.

5nt insert selection

The insert_selection directory contains code for processing and analyzing target site sequencing data used to determine the installation efficiencies of all 1024 5nt insertions for each edit site.

Data processing

Sequencing data was processed on a Linux HPC cluster with SLURM, Python 3.11, and CRISPResso 2.2.7 installed. Processed files can be generated by running

./crispresso.sh

after downloading the fastq files listed in manifest.txt from SRA and placing them in the insert_selection/fastq directory.

aggregate_crispresso.ipynb - aggregate CRISPResso output files for all sites.
crosshyb.py - estimate 5nt insert cross-hybridization

Analysis

To generate insert selection plots:

python insert_selection/plot.py

Insert validation

The insert_validation directory contains code for processing and analyzing amplicon sequencing data used for arrayed validation of the top 5nt insertions for each edit site.

Data processing

Sequencing data was processed on a Linux HPC cluster with SLURM, Python 3.11, and CRISPResso 2.2.7 installed. Processed files can be generated by running

./crispresso.sh
Rscript ../scripts/make_CRISPResso_summary.R ./ CRISPResso_summary.txt

after downloading the fastq files listed in manifest.txt from SRA and placing them in the insert_validation/fastq directory.

Analysis

plot.ipynb - generate arrayed validation plots for top 5nt insertions.

Orthogonalization

The orthogonalization directory contains code for processing and analyzing amplicon sequencing data used for validating orthogonalized versions of the RNF2, HEK3, and EMX1 edit sites.

Data processing

Sequencing data was processed on a Linux HPC cluster with SLURM, Python 3.11, and CRISPResso 2.2.7 installed. Processed files can be generated by running

./crispresso.sh
Rscript ../scripts/make_CRISPResso_summary.R ./ CRISPResso_summary.txt

after downloading the fastq files listed in manifest.txt from SRA and placing them in the orthogonalization/fastq directory.

Analysis

plot.ipynb - generate orthogonalization plots.

Orthogonal insert validation

The orthogonal_insert_validation directory contains code for processing and analyzing amplicon sequencing data used for validating the top 20 5nt insertions at each orthogonalized edit site.

Data processing

Sequencing data was processed on a Linux HPC cluster with SLURM, Python 3.11, and CRISPResso 2.2.7 installed. Processed files can be generated by running

./crispresso.sh
Rscript ../scripts/make_CRISPResso_summary.R ./ CRISPResso_summary.txt

after downloading the fastq files listed in manifest.txt from SRA and placing them in the orthogonal_insert_validation/fastq directory.

Analysis

plot.ipynb - generate plots for top 20 5nt insertions.

pegArray balance

The peg_arrays directory contains code for processing and analyzing target site sequencing data used to determine the LM installation balance for various pegArrays.

Data processing

Sequencing data was processed on a Linux HPC cluster with SLURM, Python 3.11, and CRISPResso 2.2.7 installed. Processed files can be generated by running

sbatch peg_arrays/crispresso.slurm
python peg_arrays/count_alleles.py

after downloading the fastq files listed in manifest.txt from SRA and placing them in the peg_arrays/fastq directory.

Analysis

To generate pegArray plots:

python peg_arrays/plot.py

pegRNA variant kinetics

The kinetics directory contains code for processing and analyzing 10x data for 4T1 and B16 cells transduced with a library of pegRNA variants to test editing kinetics.

Data processing

10x data was processed on a Linux HPC cluster with SLURM, Python 3.11, and Cellranger 7.1.0 installed. Processed files can be generated with the following steps:

Run Cellranger and call alleles using bam files.

sbatch kinetics/cellranger.slurm
sbatch kinetics/call_alleles.slurm

process_4T1_10x.ipynb - perform quality control, call pegRNA variants, and determine edit fraction for 4T1 cells.
process_B16F10_10x.ipynb - perform quality control, call pegRNA variants, and determine edit fraction for B16F10 cells.

after downloading the 10x fastq files listed in manifest.txt from GEO and placing them in the kinetics/fastq directory.

Analysis

All kinetics analysis and plots can be generated by running

python kinetics/estimate_rate.py
python kinetics/plot.py

after processing the raw data or downloading the processed files from Figshare and placing them in kinetics/data directory:

4T1_kinetics_alleles.csv
4T1_kinetics_cells.csv
4T1_kinetics.h5ad
B16F10_kinetics_alleles.csv
B16F10_kinetics_cells.csv
B16F10_kinetics.h5ad

Integration barcode design

Detailed integration barcode design jupyter notebooks are in folder design_intBC.

Probe design

Detailed MERFISH and PEtracer probe design are in folder design_probes. This part requires installation of package: MERFISH_probe_design

Image processing

The image_processing directory contains code for processing imaging data. Raw imaging files are not publicly available due to file size, but code can be used to process other imaging data in the same format. Processed files for each experiment (e.g. 241213_F320-4-3_MF4++) can be generated with the following steps:

Nuclei segmentation using Cellpose and Deconwolf

sbatch image_processing/241213_F320-4-3_MF4++/Scripts/cellpose.slurm

MERFISH transcript decoding using Merlin

Download the newest version of MERLin here: v0.1.8
install the merlin by:
conda create -n merlin_py310 python=3.10
conda activate merlin_py310
conda install h5py rtree pytables setuptools urllib3 python-dotenv pandas tifffile
conda install scikit-image scikit-learn scipy matplotlib networkx seaborn
conda install pytest pytest-cov numexpr cython requests boto3 xmltodict google-cloud-storage docutils pillow
pip install opencv-python pyqt5 sphinx-rtd-theme snakemake pyclustering tables cellpose
pip install -e MERLin
Test if the installation works by:
merlin -h
For the first time using MERLin, configure it by:
merlin --configure .
Then follow the instruction.
Run MERLin:
Example command:
merlin -a 20241007-MF4_TestPreprocess.json \
		-o 20240812-MF4_16bit.csv \
		-c MF4dna_codebook.csv \
		-m merscope01_microscope.json \
		-p 20240812_positions.txt \
		-e /lab/weissman_imaging/puzheng/4T1Tumor \
		-s /lab/weissman_imaging/puzheng/MERFISH_analysis/4T1 \
		-k run_MF4_cellpose.json \
		-n 2 \
		--no_report True \
		20240812-F319-12-0807_MF4dna-mCh
The example parameter files are provided in folder: merlin_parameters. Make sure to keep the subfolder structures and set the PARAMETER_HOME in the configuration step as the absolute path of this merlin_parameters folder.

Assignment of cytoplasmic transcripts to nuclei using Proseg

sbatch image_processing/241213_F320-4-3_MF4++/Scripts/proseg.slurm

Alignment of MERFISH and lineage imaging data using fishtank

sbatch image_processing/241213_F320-4-3_MF4++/Scripts/align_experiments.slurm

T7 amplicon detection and quantification using fishtank

sbatch image_processing/241213_F320-4-3_MF4++/Scripts/detect_spots.slurm

T7 amplicon decoding and cell assignment using fishtank

sbatch image_processing/241213_F320-4-3_MF4++/Scripts/decode_spots.slurm

This process was repeated for each imaging experiment, except for experiments without MERFISH data, which only required steps 1, 5, and 6.

Predefined lineage mark validation

The preedited directory contains code for processing and analyzing 10x and imaging-based readout of lineage tracing data from cells with predefined linkage between intBCs and lineage marks.

Data processing

Imaging data was processed as described in the "Image processing" section. 10x data was processed on a Linux HPC cluster with SLURM, Python 3.11, and Cellranger 7.1.0 installed. Processed files can be generated with the following steps:

For 10x run Cellranger and call alleles using bam files.

sbatch preedited/cellranger.slurm
sbatch preedited/call_alleles.slurm

process_10x_invitro.ipynb - perform quality control for 10x in vitro data.
process_merfish_invitro.ipynb - perform quality control for imaging in vitro data.
process_merfish_zombie.ipynb - perform quality control for imaging in vitro data using the zombie protocol.
process_merfish_invivio.ipynb - perform quality control for imaging in vivo data.

after downloading the 10x fastq files listed in manifest.txt from GEO and placing them in the preedited/fastq directory.

Analysis

All preedited analysis and plots can be generated by running

python preedited/plot.py

after processing the raw data or downloading the processed files from Figshare and placing them in peedited/data directory:

preedited_10x_invitro_alleles.csv
preedited_10x_invitro.h5ad
preedited_merfish_invitro_alleles.csv
preedited_merfish_invitro_cells.json
preedited_merfish_invivo_alleles.csv
preedited_merfish_invivo_cells.json
preedited_merfish_zombie_alleles.csv
preedited_merfish_zombie_cells.json

Barcoded lineage tracing

The barcoded_tracing directory contains code for processing and analyzing 10x single-cell lineage tracing for clones with puro and blast-linked static barcodes serving as independent validation of phylogenetic relationships.

Data processing

Data processing was performed on a Linux HPC cluster with SLURM, Python 3.11, and Cellranger 7.1.0 installed. Processed files can be generated with the following steps:

Run Cellranger and call alleles using bam files.

sbatch barcoded_tracing/cellranger.slurm
sbatch barcoded_tracing/call_alleles.slurm

process_10x.ipynb - performs quality control, phylogenetic reconstruction, and processing of barcode data.

after downloading the files listed in manifest.txt from GEO and placing them in the barcoded_tracing/fastq directory.

Analysis

All barcoded tracing analysis and plots can be generated by running

python barcoded_tracing/evaluate.py
python barcoded_tracing/plot.py

after processing the raw data or downloading the processed files from Figshare and placing them in colony_tracing/data directory:

barcoded_tracing_clone_1.h5td
barcoded_tracing_clone_2.h5td
barcoded_tracing_clone_3.h5td
barcoded_tracing_clone_4.h5td
barcoded_tracing_clone_5.h5td
barcoded_tracing_clone_6.h5td
barcoded_tracing_alleles.csv

Colony lineage tracing

The colony_tracing directory contains code for processing and analyzing single-cell lineage tracing from colonies generated by sparsely seeding 4T1 cells onto a coverslip.

Data processing

After processing raw images as described in the "Image processing" section the colony_process_lineage.ipynb notebook was used to segment colonies, perform quality control, and reconstruct phylogenies.

Analysis

All colony plots can be generated by running

python colony_tracing/plot.py

after the following files from Figshare are downloaded and placed in the colony_tracing/data directory:

colony_tracing.h5td
colony_polygons.json

4T1 in vitro heterogeneity

The invitro_heterogeneity directory contains code for processing and analyzing single-cell data characterizing in vitro transcriptional heterogeneity in engineered 4T1 cells used to seed tumors.

Data processing

Data processing was performed on a Linux HPC cluster with SLURM, Python 3.11, and Cellranger 7.1.0 installed. Processed files can be generated with the following steps:

Run Cellranger and call alleles using bam files.

sbatch invitro_heterogeneity/cellranger.slurm
sbatch invitro_heterogeneity/call_alleles.slurm

process_10x.ipynb - performs quality control and clustering.

after downloading the files listed in manifest.txt from GEO and placing them in the invitro_heterogeneity/fastq directory.

Analysis

All in vitro heterogeneity analysis and plots can be generated by running

python invitro_heterogeneity/plot.py

after processing the raw data or downloading the processed files from Figshare and placing them in invitro_heterogeneity/data directory:

4T1_invitro.h5ad

4T1 tumor lineage tracing

The tumor_tracing directory contains code for processing and analyzing single-cell transcriptomic and lineage tracing data from the 4T1 syngeneic mouse model of tumor metastasis.

Data processing

After processing raw images as described in the "Image processing" section, the following notebooks were used to generate the mouse 1 data:

M1_resolVI_training.ipynb - trains resolVI model to classify cell types and filter out doublets.
M1_process_MERFISH.ipynb - performs quality control and annotation of the MERFISH data.
M1_segment_tumors.ipynb - aligns tumor sections, segments tumors, and calculate spatial statistics.
M1_process_lineage.ipynb - performs quality control of lineage data and reconstructs phylogenies.

The same process was repeated for the mouse 2 and 3 data, except a new resolVI model was not trained for mouse 2 since the library is shared with mouse 1.

Analysis

All tumor plots can be generated by running

python tumor_tracing/plot.py

after the following files from Figshare are downloaded and placed in the tumor_tracing/data directory:

10x_4T1_primary.h5ad
M1_tumor_tracing.h5td
M1_polygons_grid.json
M2_tumor_tracing.h5td
M2_polygons.json
M3_tumor_tracing.h5td
M3_polygons_grid.json

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
barcoded_tracing		barcoded_tracing
colony_tracing		colony_tracing
design_intBC		design_intBC
design_probes		design_probes
image_processing		image_processing
insert_selection		insert_selection
insert_validation		insert_validation
invitro_heterogeneity		invitro_heterogeneity
kinetics		kinetics
legends		legends
merlin_parameters		merlin_parameters
orthogonal_insert_validation		orthogonal_insert_validation
orthogonalization		orthogonalization
peg_arrays		peg_arrays
petracer		petracer
preedited		preedited
reference		reference
scripts		scripts
simulation		simulation
strategy_selection		strategy_selection
tumor_tracing		tumor_tracing
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.lock.yml		environment.lock.yml
environment.yml		environment.yml
img.jpg		img.jpg
plot.mplstyle		plot.mplstyle
pyproject.toml		pyproject.toml

License

jweissmanlab/PEtracer-2025

Folders and files

Latest commit

History

Repository files navigation

PEtracer 2025

Setup

Python environment

Image processing environment

Data availability

Simulation

Prime editing strategy selection

Data processing

Analysis

5nt insert selection

Data processing

Analysis

Insert validation

Data processing

Analysis

Orthogonalization

Data processing

Analysis

Orthogonal insert validation

Data processing

Analysis

pegArray balance

Data processing

Analysis

pegRNA variant kinetics

Data processing

Analysis

Integration barcode design

Probe design

Image processing

Predefined lineage mark validation

Data processing

Analysis

Barcoded lineage tracing

Data processing

Analysis

Colony lineage tracing

Data processing

Analysis

4T1 in vitro heterogeneity

Data processing

Analysis

4T1 tumor lineage tracing

Data processing

Analysis

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 3

Uh oh!

Languages

Packages