Skip to content

EichlerLab/asap

Repository files navigation

ASAP

Autism Susceptibility Analysis Pipeline with focus in Structural Variants.

This repo logs tasks involved for this project- either executed sequentially or asynchronously.

Table of Contents

Inputs

Sample origin/cohort

This pertains to SSC + SAGE + Rett-like cohort (189 individuals from 51 families).

Count Sex (proband-sibling) Family type
12 F-F quad
16 F-M quad
3 M-F quad
5 M-M quad
13 F trio
2 M trio

Sample sheet is here: /net/eichler/vol28/projects/autism_genome_assembly/nobackups/sample_info.tab

Dataset S1

QC

back-reference-qc (Kraken2)
  • use this pipeline for non-human contamination of reads.
  • use this tool/pipeline to assess inter-sample contamination.
    • minimal requirement: fastq.gz
  • use this tool/pipeline to assess contamination of non-humanness as well as inter-sample contamination.
  • use this tool/pipeline to assess inter-sample contamination as well as ancestry and relatedness.
    • minimal requirement: bam
  • use this tool/pipeline to assess quality of genome assembly
    • minimal requirement: fastq.gz and its own Illumina
    • Internal path
  • use this to check either per cell or sample for sex verification

Genome assembly

This step produces a fasta file.

Genome alignment

This step is produces a BAM. And can be achieved via: Internal path

  • PacBio_HiFi fastq.gz input, the pipeline uses pbmm2

Variant calling

  • PAV: use hifiasm assembly output for this.(instructions)
  • pbsv: use pbmm2 output as input for this.
  • Sniffles: use pbmm2 output as input for this.

SV merging

The steps here are using Truvari by sequential order.

1. Sampleset merge.

bcftools merge --thread {threads} --merge none --force-samples -O z -o {output.vcf.gz} {input.vcf1.gz} {input.vcf2.gz} {input.vcf3.gz}
truvari collapse -i {input.vcf.gz} -c {output.removed.vcf.gz} --sizemin 0 --sizemax 1000000 -k maxqual --gt het --intra --pctseq 0.90 --pctsize 0.90 --refdist 500 | bcftools sort --max-mem 8G -O z -o {output.collapsed.vcf.gz}

2. Inter-sample merge.

bcftools merge --threads {threads} --merge none --force-samples --file-list {input.vcflist} -O z | bcftools norm --threads 15 --do-not-normalize --multiallelics -any --output-type z -o {output.mergevcf.gz}
truvari collapse --input {input.mergevcf.gz} --collapsed-output {output.removed_vcf.gz} --sizemin 0 --sizemax 1000000 --pctseq 0.90 --pctsize 0.90 --keep common --gt all | bcftools sort --max-mem {resources}G --output-type z > {output.collapsed_vcf.gz}
python rareSVpool.py {input.collapsed_sv}

4. De novo validation

  • Initial caller support using Truvari
  • Callable region evaluation using BoostSV
  • Genotyping support using kanpig
  • Rare TR expansions/contractions using TRGT
  • Multiple sequence alignment (MSA) using MAFFT
  • Read-based support validation using subseq or notes here
  • Manual inspection using IGV

Annotation (GRCh38)

This step produces a methylation bed file and bigwig files of the beds. Internal path

Citation

For citation, please refer to our paper at: https://www.medrxiv.org/content/10.1101/2025.07.21.25331932v1

About

Autism Susceptibility Analysis Pipeline with focus in Structural Variants

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •