Skip to content

jweissmanlab/PEtracer-2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PEtracer 2025

This repository contains the code to reproduce all analyses and figures from the manuscript "High-resolution spatial mapping of cell state and lineage dynamics in vivo with PEtracer".

Setup

Python environment

conda env create --file environment.yml
conda activate petracer
ipython kernel install --user --name petracer

The environment.lock.yml file can be used to recreate the environment with the exact package versions used in the paper.

Image processing environment

Image processing was performed on a linux HPC cluster with the following software installed:

Data availability

  • Processed data is available on Figshare
  • Single-cell RNA-seq data is available on GEO
  • All other sequencing data is available on SRA

Simulation

The simulation directory contains code for simulating lineage tracing data with a variety of parameters. To run simulations:

python simulation/simulate.py

To generate simulation plots:

python simulation/plot.py

Prime editing strategy selection

The insertvariants and RTT_optimization directories contain code for processing and analyzing amplicon sequencing data used to select edit sites and optimize editing strategies for the PEtracer system.

Data processing

Sequencing data was processed on a Linux HPC cluster with SLURM, Python 3.11, and CRISPResso 2.2.7 installed. Processed files can be generated by running

./crispresso.sh
Rscript ../../scripts/make_CRISPResso_summary.R ./ CRISPResso_summary.txt

after downloading the fastq files listed in manifest.txt from SRA and placing them in the strategy_selection/insertvariants/fastq directory.

Analysis

plot.ipynb - generate strategy selection plots.

5nt insert selection

The insert_selection directory contains code for processing and analyzing target site sequencing data used to determine the installation efficiencies of all 1024 5nt insertions for each edit site.

Data processing

  1. Sequencing data was processed on a Linux HPC cluster with SLURM, Python 3.11, and CRISPResso 2.2.7 installed. Processed files can be generated by running
./crispresso.sh

after downloading the fastq files listed in manifest.txt from SRA and placing them in the insert_selection/fastq directory.

  1. aggregate_crispresso.ipynb - aggregate CRISPResso output files for all sites.
  2. crosshyb.py - estimate 5nt insert cross-hybridization

Analysis

To generate insert selection plots:

python insert_selection/plot.py

Insert validation

The insert_validation directory contains code for processing and analyzing amplicon sequencing data used for arrayed validation of the top 5nt insertions for each edit site.

Data processing

Sequencing data was processed on a Linux HPC cluster with SLURM, Python 3.11, and CRISPResso 2.2.7 installed. Processed files can be generated by running

./crispresso.sh
Rscript ../scripts/make_CRISPResso_summary.R ./ CRISPResso_summary.txt

after downloading the fastq files listed in manifest.txt from SRA and placing them in the insert_validation/fastq directory.

Analysis

plot.ipynb - generate arrayed validation plots for top 5nt insertions.

Orthogonalization

The orthogonalization directory contains code for processing and analyzing amplicon sequencing data used for validating orthogonalized versions of the RNF2, HEK3, and EMX1 edit sites.

Data processing

Sequencing data was processed on a Linux HPC cluster with SLURM, Python 3.11, and CRISPResso 2.2.7 installed. Processed files can be generated by running

./crispresso.sh
Rscript ../scripts/make_CRISPResso_summary.R ./ CRISPResso_summary.txt

after downloading the fastq files listed in manifest.txt from SRA and placing them in the orthogonalization/fastq directory.

Analysis

plot.ipynb - generate orthogonalization plots.

Orthogonal insert validation

The orthogonal_insert_validation directory contains code for processing and analyzing amplicon sequencing data used for validating the top 20 5nt insertions at each orthogonalized edit site.

Data processing

Sequencing data was processed on a Linux HPC cluster with SLURM, Python 3.11, and CRISPResso 2.2.7 installed. Processed files can be generated by running

./crispresso.sh
Rscript ../scripts/make_CRISPResso_summary.R ./ CRISPResso_summary.txt

after downloading the fastq files listed in manifest.txt from SRA and placing them in the orthogonal_insert_validation/fastq directory.

Analysis

plot.ipynb - generate plots for top 20 5nt insertions.

pegArray balance

The peg_arrays directory contains code for processing and analyzing target site sequencing data used to determine the LM installation balance for various pegArrays.

Data processing

Sequencing data was processed on a Linux HPC cluster with SLURM, Python 3.11, and CRISPResso 2.2.7 installed. Processed files can be generated by running

sbatch peg_arrays/crispresso.slurm
python peg_arrays/count_alleles.py

after downloading the fastq files listed in manifest.txt from SRA and placing them in the peg_arrays/fastq directory.

Analysis

To generate pegArray plots:

python peg_arrays/plot.py

pegRNA variant kinetics

The kinetics directory contains code for processing and analyzing 10x data for 4T1 and B16 cells transduced with a library of pegRNA variants to test editing kinetics.

Data processing

10x data was processed on a Linux HPC cluster with SLURM, Python 3.11, and Cellranger 7.1.0 installed. Processed files can be generated with the following steps:

  1. Run Cellranger and call alleles using bam files.
sbatch kinetics/cellranger.slurm
sbatch kinetics/call_alleles.slurm
  1. process_4T1_10x.ipynb - perform quality control, call pegRNA variants, and determine edit fraction for 4T1 cells.
  2. process_B16F10_10x.ipynb - perform quality control, call pegRNA variants, and determine edit fraction for B16F10 cells.

after downloading the 10x fastq files listed in manifest.txt from GEO and placing them in the kinetics/fastq directory.

Analysis

All kinetics analysis and plots can be generated by running

python kinetics/estimate_rate.py
python kinetics/plot.py

after processing the raw data or downloading the processed files from Figshare and placing them in kinetics/data directory:

  • 4T1_kinetics_alleles.csv
  • 4T1_kinetics_cells.csv
  • 4T1_kinetics.h5ad
  • B16F10_kinetics_alleles.csv
  • B16F10_kinetics_cells.csv
  • B16F10_kinetics.h5ad

Integration barcode design

Detailed integration barcode design jupyter notebooks are in folder design_intBC.

Probe design

Detailed MERFISH and PEtracer probe design are in folder design_probes. This part requires installation of package: MERFISH_probe_design

Image processing

The image_processing directory contains code for processing imaging data. Raw imaging files are not publicly available due to file size, but code can be used to process other imaging data in the same format. Processed files for each experiment (e.g. 241213_F320-4-3_MF4++) can be generated with the following steps:

  1. Nuclei segmentation using Cellpose and Deconwolf
sbatch image_processing/241213_F320-4-3_MF4++/Scripts/cellpose.slurm
  1. MERFISH transcript decoding using Merlin

Download the newest version of MERLin here: v0.1.8

install the merlin by:

conda create -n merlin_py310 python=3.10
conda activate merlin_py310
conda install h5py rtree pytables setuptools urllib3 python-dotenv pandas tifffile
conda install scikit-image scikit-learn scipy matplotlib networkx seaborn
conda install pytest pytest-cov numexpr cython requests boto3 xmltodict google-cloud-storage docutils pillow
pip install opencv-python pyqt5 sphinx-rtd-theme snakemake pyclustering tables cellpose
pip install -e MERLin

Test if the installation works by:

merlin -h

For the first time using MERLin, configure it by:

merlin --configure .

Then follow the instruction.

Run MERLin:

Example command:

merlin -a 20241007-MF4_TestPreprocess.json \
		-o 20240812-MF4_16bit.csv \
		-c MF4dna_codebook.csv \
		-m merscope01_microscope.json \
		-p 20240812_positions.txt \
		-e /lab/weissman_imaging/puzheng/4T1Tumor \
		-s /lab/weissman_imaging/puzheng/MERFISH_analysis/4T1 \
		-k run_MF4_cellpose.json \
		-n 2 \
		--no_report True \
		20240812-F319-12-0807_MF4dna-mCh

The example parameter files are provided in folder: merlin_parameters. Make sure to keep the subfolder structures and set the PARAMETER_HOME in the configuration step as the absolute path of this merlin_parameters folder.

  1. Assignment of cytoplasmic transcripts to nuclei using Proseg
sbatch image_processing/241213_F320-4-3_MF4++/Scripts/proseg.slurm
  1. Alignment of MERFISH and lineage imaging data using fishtank
sbatch image_processing/241213_F320-4-3_MF4++/Scripts/align_experiments.slurm
  1. T7 amplicon detection and quantification using fishtank
sbatch image_processing/241213_F320-4-3_MF4++/Scripts/detect_spots.slurm
  1. T7 amplicon decoding and cell assignment using fishtank
sbatch image_processing/241213_F320-4-3_MF4++/Scripts/decode_spots.slurm

This process was repeated for each imaging experiment, except for experiments without MERFISH data, which only required steps 1, 5, and 6.

Predefined lineage mark validation

The preedited directory contains code for processing and analyzing 10x and imaging-based readout of lineage tracing data from cells with predefined linkage between intBCs and lineage marks.

Data processing

Imaging data was processed as described in the "Image processing" section. 10x data was processed on a Linux HPC cluster with SLURM, Python 3.11, and Cellranger 7.1.0 installed. Processed files can be generated with the following steps:

  1. For 10x run Cellranger and call alleles using bam files.
sbatch preedited/cellranger.slurm
sbatch preedited/call_alleles.slurm
  1. process_10x_invitro.ipynb - perform quality control for 10x in vitro data.
  2. process_merfish_invitro.ipynb - perform quality control for imaging in vitro data.
  3. process_merfish_zombie.ipynb - perform quality control for imaging in vitro data using the zombie protocol.
  4. process_merfish_invivio.ipynb - perform quality control for imaging in vivo data.

after downloading the 10x fastq files listed in manifest.txt from GEO and placing them in the preedited/fastq directory.

Analysis

All preedited analysis and plots can be generated by running

python preedited/plot.py

after processing the raw data or downloading the processed files from Figshare and placing them in peedited/data directory:

  • preedited_10x_invitro_alleles.csv
  • preedited_10x_invitro.h5ad
  • preedited_merfish_invitro_alleles.csv
  • preedited_merfish_invitro_cells.json
  • preedited_merfish_invivo_alleles.csv
  • preedited_merfish_invivo_cells.json
  • preedited_merfish_zombie_alleles.csv
  • preedited_merfish_zombie_cells.json

Barcoded lineage tracing

The barcoded_tracing directory contains code for processing and analyzing 10x single-cell lineage tracing for clones with puro and blast-linked static barcodes serving as independent validation of phylogenetic relationships.

Data processing

Data processing was performed on a Linux HPC cluster with SLURM, Python 3.11, and Cellranger 7.1.0 installed. Processed files can be generated with the following steps:

  1. Run Cellranger and call alleles using bam files.
sbatch barcoded_tracing/cellranger.slurm
sbatch barcoded_tracing/call_alleles.slurm
  1. process_10x.ipynb - performs quality control, phylogenetic reconstruction, and processing of barcode data.

after downloading the files listed in manifest.txt from GEO and placing them in the barcoded_tracing/fastq directory.

Analysis

All barcoded tracing analysis and plots can be generated by running

python barcoded_tracing/evaluate.py
python barcoded_tracing/plot.py

after processing the raw data or downloading the processed files from Figshare and placing them in colony_tracing/data directory:

  • barcoded_tracing_clone_1.h5td
  • barcoded_tracing_clone_2.h5td
  • barcoded_tracing_clone_3.h5td
  • barcoded_tracing_clone_4.h5td
  • barcoded_tracing_clone_5.h5td
  • barcoded_tracing_clone_6.h5td
  • barcoded_tracing_alleles.csv

Colony lineage tracing

The colony_tracing directory contains code for processing and analyzing single-cell lineage tracing from colonies generated by sparsely seeding 4T1 cells onto a coverslip.

Data processing

After processing raw images as described in the "Image processing" section the colony_process_lineage.ipynb notebook was used to segment colonies, perform quality control, and reconstruct phylogenies.

Analysis

All colony plots can be generated by running

python colony_tracing/plot.py

after the following files from Figshare are downloaded and placed in the colony_tracing/data directory:

  • colony_tracing.h5td
  • colony_polygons.json

4T1 in vitro heterogeneity

The invitro_heterogeneity directory contains code for processing and analyzing single-cell data characterizing in vitro transcriptional heterogeneity in engineered 4T1 cells used to seed tumors.

Data processing

Data processing was performed on a Linux HPC cluster with SLURM, Python 3.11, and Cellranger 7.1.0 installed. Processed files can be generated with the following steps:

  1. Run Cellranger and call alleles using bam files.
sbatch invitro_heterogeneity/cellranger.slurm
sbatch invitro_heterogeneity/call_alleles.slurm
  1. process_10x.ipynb - performs quality control and clustering.

after downloading the files listed in manifest.txt from GEO and placing them in the invitro_heterogeneity/fastq directory.

Analysis

All in vitro heterogeneity analysis and plots can be generated by running

python invitro_heterogeneity/plot.py

after processing the raw data or downloading the processed files from Figshare and placing them in invitro_heterogeneity/data directory:

  • 4T1_invitro.h5ad

4T1 tumor lineage tracing

The tumor_tracing directory contains code for processing and analyzing single-cell transcriptomic and lineage tracing data from the 4T1 syngeneic mouse model of tumor metastasis.

Data processing

After processing raw images as described in the "Image processing" section, the following notebooks were used to generate the mouse 1 data:

  1. M1_resolVI_training.ipynb - trains resolVI model to classify cell types and filter out doublets.
  2. M1_process_MERFISH.ipynb - performs quality control and annotation of the MERFISH data.
  3. M1_segment_tumors.ipynb - aligns tumor sections, segments tumors, and calculate spatial statistics.
  4. M1_process_lineage.ipynb - performs quality control of lineage data and reconstructs phylogenies.

The same process was repeated for the mouse 2 and 3 data, except a new resolVI model was not trained for mouse 2 since the library is shared with mouse 1.

Analysis

All tumor plots can be generated by running

python tumor_tracing/plot.py

after the following files from Figshare are downloaded and placed in the tumor_tracing/data directory:

  • 10x_4T1_primary.h5ad
  • M1_tumor_tracing.h5td
  • M1_polygons_grid.json
  • M2_tumor_tracing.h5td
  • M2_polygons.json
  • M3_tumor_tracing.h5td
  • M3_polygons_grid.json

About

Code for Koblan, Yost, Zheng, Colgan, et al. Science (2025)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages