Skip to content

Dr-TSteimle/sv-finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

example workflow

sv-finder 🔎 🧬

Software for genomic structural variant detection and description.

Quickstart

✨ Installation

  1. Download the latest release:
wget https://github.com/Dr-TSteimle/sv-finder/releases/download/1.1.0/sv-finder
  1. Make the file executable:
chmod +x sv-finder

You can also install the latest version directly from the GitHub repository using cargo:

cargo install --git https://github.com/Dr-TSteimle/sv-finder

🔥 Usage

./sv-finder -h
Usage: sv-finder [OPTIONS] --bam-path <BAM_PATH> --fasta-ref-path <FASTA_REF_PATH> --cytobands-path <CYTOBANDS_PATH>

Options:
  -b, --bam-path <BAM_PATH>                 
  -f, --fasta-ref-path <FASTA_REF_PATH>     
  -o, --output-prefix <OUTPUT_PREFIX>       output file prefix [default: sv-finder]
  -p, --threads <THREADS>                   [default: 1]
  -c, --cytobands-path <CYTOBANDS_PATH>     a cytoband decompressed file, tsv with columns: contig, start, end, cytoband, content.
                                            	- hg19: https://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cytoBandIdeo.txt.gz
                                            	- hg38: https://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/cytoBandIdeo.txt.gz
      --distance-threshold <DISTANCE_THRE>  maximum distance in nucleotids between two misaligned reads
                                            for trying to assemble them together
                                             [default: 350]
      --min-overlapping <MIN_OVERLAPPING>   minimum overlapping length (nt) required for assembling
                                            two reads together
                                             [default: 50]
      --max-consecutive <MAX_CONSECUTIVE>   maximum number of consecutive overlapping mismatch
                                            allowed for assembling reads together
                                             [default: 1]
      --max-mismatches <MAX_DIFFS>          maximum number of overlapping mismatch allowed for
                                            assembling reads together
                                             [default: 3]
      --min-reads <MIN_READS>               minimum reads per cluster
                                             [default: 10]

  -r, --repeat-masker-path <REPEAT_MASKER>
      repeat masker gz file path from UCSC (rmsk.txt.gz)
            - hg19: https://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/rmsk.txt.gz
            - hg38: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/rmsk.txt.gz
  -h, --help                                Print help (see more with '--help')
  -V, --version                             Print version

sv-finder minimal needed parameters are :

Defaults parameters should be adequate for small reads analysis.

The algorithm is multi-threaded and written in rust 🦀 for faster processing.

🔑 Results

The output is a TSV file with the following columns:

  • Name of the cluster.
  • Genomic coordinates relatives to the reference and to the query (contig:reference_start-reference_stop|query_start-query_stop), the ranges are 1-based and inclusive.
  • The hgvs unique variant identifier.
  • The assembled sequence.

Example of a t(10;14) TLX1-TRD observed in an ALL-T sample with it's derivative:

chr10:821|chr14:80_0	chr14:22907700-22908008|1-309;chr10:102890895-102890934|320-359	chr14:g.22908008_qter[CGTAGCCCCC;chr10:g.102890895_qter]	GTTAATACTTTACAGTTTTATTACTAGAGGGTTAAAATCCTTTTTCAAGTCTGATAATCAATGATTAACTTTCTTCATTTGTCCTTCACCCATTTGTTTTTTAGGTTGATGGTGTTTTACTTATTGATTTGTGTAATTATAATAATTTTGTGTCTGAGTTTTACAGCATTTAACCACAAAAACAGCATTGGTGAAAGGAGTTTCAGGGGTATTGTGGATGGCAGCGGGTGGTGATGGCAAAGTGCCAAGGAAAGGGAAAAAGGAAGAAGAGGGTTTTTATACTGATGTGTTTCATTGTGCCTTCCTACCGTAGCCCCCGATCTCTGGCTCCGGCATCTGTCTCGGCTTCTGGCGTTCCTGGCCCGCGCGGCGGGCCGCCCTC
chr10:821|chr14:81_1	chr14:22918338-22918105|1-234;chr10:102890891-102890827|240-304	chr14:g.22918105_qterinv[TACCG;chr10:g.102890891_qterinv]	ACCCAAGGAAGAACAGCAGTGAGTGAGAGGTCAGCAGCTGTGGTCATCTCCCTGGTCCAGTCAACTTCCTGCTATCCCTTCCAGGCCCCAAAGCAGGGAGGGAAGCTGCTTGCTGTGTTTGTCTCCTGAGGCATGGGACCCAGGGTGAGGATATCCCAGGGAAATGGCACTTTTGCCCCTGCAGTTTTTGTACAGGTCTCTGTAGGTTTTGTAGCACTGTGCGTATCCCCCAGTACCGTGGGACGGAGACCAAGACTCGGAGTAGTTCATGAAGAGAGAGAAGAGGGGAACAAGGCGAGGCTTA

About

Fast and accurate genomic structural variations caller

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages