Autism Susceptibility Analysis Pipeline with focus in Structural Variants.
This repo logs tasks involved for this project- either executed sequentially or asynchronously.
- Inputs
- QC
- Genome assembly
- Genome alignment
- Variant calling
- SV merging
- Annotation
- Methylation
- Housekeeping
- Citation
This pertains to SSC + SAGE + Rett-like cohort (189 individuals from 51 families).
Count | Sex (proband-sibling) | Family type |
---|---|---|
12 | F-F | quad |
16 | F-M | quad |
3 | M-F | quad |
5 | M-M | quad |
13 | F | trio |
2 | M | trio |
Sample sheet is here: /net/eichler/vol28/projects/autism_genome_assembly/nobackups/sample_info.tab
Dataset S1
back-reference-qc (Kraken2)
- use this pipeline for non-human contamination of reads.
- minimal requirement: fastq.gz
- Internal path
- use this tool/pipeline to assess inter-sample contamination.
- minimal requirement: fastq.gz
- use this tool/pipeline to assess contamination of non-humanness as well as inter-sample contamination.
- minimal requirement: bam
- Internal path
- use this tool/pipeline to assess inter-sample contamination as well as ancestry and relatedness.
- minimal requirement: bam
- use this tool/pipeline to assess quality of genome assembly
- minimal requirement: fastq.gz and its own Illumina
- Internal path
- use this to check either per cell or sample for sex verification
- minimal requirement: bam
- click here for notes
This step produces a fasta file.
-
hifiasm: use this pipeline/tool to assemble sample genome (trio-phased requires parental Illumina data as input).
-
Version used for all our samples: hifiasm 0.16.1 with just HiFi data.
-
fix-sex-chromosome: use this pipeline to fix partially phased autism family fathers.
This step is produces a BAM. And can be achieved via: Internal path
- PacBio_HiFi fastq.gz input, the pipeline uses pbmm2
- PAV: use hifiasm assembly output for this.(instructions)
- pbsv: use pbmm2 output as input for this.
- Sniffles: use pbmm2 output as input for this.
The steps here are using Truvari by sequential order.
bcftools merge --thread {threads} --merge none --force-samples -O z -o {output.vcf.gz} {input.vcf1.gz} {input.vcf2.gz} {input.vcf3.gz}
truvari collapse -i {input.vcf.gz} -c {output.removed.vcf.gz} --sizemin 0 --sizemax 1000000 -k maxqual --gt het --intra --pctseq 0.90 --pctsize 0.90 --refdist 500 | bcftools sort --max-mem 8G -O z -o {output.collapsed.vcf.gz}
bcftools merge --threads {threads} --merge none --force-samples --file-list {input.vcflist} -O z | bcftools norm --threads 15 --do-not-normalize --multiallelics -any --output-type z -o {output.mergevcf.gz}
truvari collapse --input {input.mergevcf.gz} --collapsed-output {output.removed_vcf.gz} --sizemin 0 --sizemax 1000000 --pctseq 0.90 --pctsize 0.90 --keep common --gt all | bcftools sort --max-mem {resources}G --output-type z > {output.collapsed_vcf.gz}
python rareSVpool.py {input.collapsed_sv}
- Initial caller support using Truvari
- Callable region evaluation using BoostSV
- Genotyping support using kanpig
- Rare TR expansions/contractions using TRGT
- Multiple sequence alignment (MSA) using MAFFT
- Read-based support validation using subseq or notes here
- Manual inspection using IGV
- Gene and location annotation using AnnotSV, and then simplified by using sim_annotSV.py
- CADD score using CADD-SV
- Regulatory annotation using REG data and comREG
- Combine all annotations from comREG
This step produces a methylation bed file and bigwig files of the beds. Internal path
For citation, please refer to our paper at: https://www.medrxiv.org/content/10.1101/2025.07.21.25331932v1