SCALPEL is a robust pipeline designed for transcript isoform quantification and alternative polyadenylation (APA) characterization using 3'-tagged single-cell RNA-seq (scRNA-seq) data. Built with Nextflow, it integrates multiple processing steps including read quantification, APA annotation, and isoform usage analysis.
Nextflow v24.10.6 : Official page / CONDA
SCALPEL can be installed and run using one of the following options:
- Clone the repository
git clone https://github.com/p-CMRC-LAB/SCALPEL.git
cd SCALPEL
- Create the Conda environment
conda env create -f requirements.yml
conda activate scalpel_conda
- Run SCALPEL within the environment
nextflow run -resume main.nf \
--sequencing chromium \
--samplesheet path/to/samplesheet.csv \
--transcriptome path/to/gencode.transcripts.fa \
--gtf path/to/gencode.annotation.gtf \
--ipdb path/to/mm10.polyA.track \
--barcodes path/to/barcodes.csv \
--clusters path/to/clusters.txt
You can download a prebuilt Apptainer container with all SCALPEL dependencies from the following link:
Download SCALPEL Container
- Download the container and clone the repository
wget https://data.cyverse.org/dav-anon/iplant/home/franzx5/SCALPEL.container.sif
git clone https://github.com/p-CMRC-LAB/SCALPEL.git
cd SCALPEL
- Run SCALPEL using the container
nextflow run /path/to/SCALPEL/main.nf \
-with-apptainer /path/to/scalpel_container.sif \
--sequencing chromium \
--samplesheet path/to/samplesheet.csv \
--transcriptome path/to/gencode.transcripts.fa \
--gtf path/to/gencode.annotation.gtf \
--ipdb path/to/mm10.polyA.track
Parameter | Description |
---|---|
--samplesheet |
CSV with sample names and paths to FASTQ/BAM/CellRanger output |
--transcriptome |
FASTA of reference transcriptome |
--gtf |
GTF annotation file |
--ipdb |
Internal priming annotation file |
--barcodes |
(Optional) Barcode whitelist per sample |
--clusters |
(Optional) Tab-delimited file with cell-to-cluster mappings |
--sequencing |
Must be chromium or dropseq |
Reference files:
After execution, SCALPEL generates a results/
directory containing key outputs for downstream analysis.
File / Pattern | Description |
---|---|
*_filtered.bam |
BAM files with deduplicated reads excluding internal priming artifacts. |
*_filtered.bam.bai |
BAM index files. |
*_APADGE.txt |
APA-aware isoform-level expression matrix per sample. |
*_seurat.RDS |
Seurat object per sample. |
iDGE_seurat.RDS |
Merged Seurat object across all samples. |
DIU_table.csv |
Differential isoform usage table. |
Runfiles/ |
Execution logs and process metadata. |
- Output filenames are prefixed by the sample name.
- Seurat
.RDS
files are ready for downstream visualization and clustering in R. *_APADGE.txt
matrices are compatible with other statistical environments.
For downstream analysis tutorials, visit:
- Example of SCALPEL application on 10X scRNA-seq
- Example of SCALPEL application on DropSeq scRNA-seq
- Downstream analysis Wiki
To modify resource usage and process settings, edit the nextflow.config
file. For example:
executor {
name = 'slurm' // Use 'local', 'slurm', etc.
cpus = 64
}
process {
withLabel: big_mem {
cpus = 4
memory = '8 GB'
}
withLabel: small_mem {
cpus = 2
memory = '2 GB'
}
// Additional process-specific settings...
}
If using Apptainer, make sure to bind your local SCALPEL repository path inside the container by editing the following block in nextflow.config
:
apptainer {
enabled = true
autoMounts = true
runOptions = "--bind /path/to/SCALPEL:/path/to/SCALPEL"
}
Adjust /path/to/SCALPEL
to the full absolute path where the SCALPEL repository is located on your system.
Franz Ake, Sandra M. Fernández-Moya, Marcel Schilling, Akshay Jaya Ganesh, Ana Gutiérrez-Franco, Lei Li, Mireya Plass
Quantification of transcript isoforms at the single-cell level using SCALPEL
bioRxiv 2024.06.21.600022; https://doi.org/10.1101/2024.06.21.600022
Franz AKE – @aerodx5 – fake@idibell.cat
GitHub: https://github.com/p-CMRC-LAB/SCALPEL