IsoSV

A pipeline for detecting structural variants from RNA-Seq Data

Team

Overview

Detecting structural variants (SVs) from RNA-seq data presents unique challenges. Unlike DNA sequencing, RNA reads span spliced transcripts, resulting in complex CIGAR patterns that include skipped regions (N), soft-clips, insertions, deletions, and split alignments. Standard DNA-based SV callers often misinterpret these signals, leading to missed or misclassified events. To overcome this, we developed a pipeline i.e. IsoSV that scans RNA-seq BAM files to identify candidate SVs by parsing CIGAR operations, split-read (SA) tags, and, optionally, exon annotations. The method distinguishes expected introns from potential novel splice junctions or structural rearrangements, reporting each event in both TSV and VCF formats to enable comprehensive detection and downstream analysis of SVs from RNA-seq data. At the end we are validating these SV finding with known DNA variants. It involves three steps: IsoParser, IsoClustering and IsoAnnotator.

Gene/Transcript Fusion vs RNA SV

RNA structural variants are any transcript-level rearrangements observed in RNA-seq reads, whereas transcript fusions are specific chimeric transcripts joining exons from two separate genes, often reflecting underlying DNA rearrangements.

Workflow

The IsoSV workflow for structural variant analysis was designed to identify and evaluate candidate variants from RNA sequencing data. The process begins with parsing BAM files from RNASeq data (Long Read and Short Read), where input BAM alignments are filtered based on mapping quality (XXXX), and candidate variants larger than 30-50 bp are extracted. These candidates are exported into text or VCF files. Next, a data structure for genomic intervals is used to cluster similar candidate entries, generating consolidated variant calls in text or VCF format. To enrich for biological relevance, candidate regions are prioritized using external known annotation resources such as BED or GFF files are incorporated to annotate candidates, producing an updated VCF with annotated structural variants. For visualization, the resulting VCF files are inspected in IGV. Finally, benchmark datasets (HG002) are used for evaluation as a truthset using bedtools and RNASeq coverage profiles to validate expression of the candidate events. To benchmark performance, we constructed a GIAB truth set of large indels by combining two sources: (i) structural variant calls annotated with SVTYPE and SVLEN, and (ii) indel calls derived from reference/alternative allele length differences (≥30 bp). Each variant was represented as a ±10 bp window in BED format. Overlap between Clair3-RNA candidate large indels and GIAB truth sets was assessed using bedtools intersect. Validation was defined as any overlap between candidate and GIAB indels within this positional tolerance.

Installation

🚀 Getting Started

IsoSV is implemented as a three-stage workflow:

Step 1: IsoParser - Identifies candidate events by scanning BAM alignments for signatures of SVs encoded in CIGAR strings and supplementary alignment tags. It flags long insertions, deletions, skipped regions, and soft clips while recording read-level support.

Running IsoParser

Step 2: IsoClustering - Processes candidate events to merge overlapping signals across reads. It uses an interval-tree data structure, querying and consolidation of nearby events. This clustering step is used to distinguish true biological SVs from noise in alignment.

Running IsoClustering

Step 3: IsoAnnotator - overlays SVs with transcript annotations, classifying them as exon deletions, gene fusions, canonical splice events, or intronic rearrangements. This annotation ensures that SV calls are interpretable and aligned with known genes.

Running IsoAnnotator

Presentation

https://docs.google.com/presentation/d/1-pTwId0y6V8OCrv-FYEJuwpxVq7wN8G9hY5ynpy-XS0/edit?usp=sharing

Examples

The below examples that highlight a deletion and insertion event identified in the RNASeq data and confirmed through visualization in short-read WGS data.

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
step_a_IsoParser		step_a_IsoParser
step_b_IsoClustering		step_b_IsoClustering
step_c_IsoAnnotator		step_c_IsoAnnotator
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IsoSV

Team

Overview

Gene/Transcript Fusion vs RNA SV

Workflow

Installation

🚀 Getting Started

Presentation

Examples

About

Uh oh!

Releases

Packages

Contributors 14

Uh oh!

Languages

collaborativebioinformatics/IsoSV

Folders and files

Latest commit

History

Repository files navigation

IsoSV

Team

Overview

Gene/Transcript Fusion vs RNA SV

Workflow

Installation

🚀 Getting Started

Presentation

Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 14

Uh oh!

Languages

Packages