basic scripts to perform small variant analysis on a server
We aimed at producing gVCF data or call all genomes simultaneously in order to handle Ref-calls (same as reference genome) and No-calls (no read support) and obtain a better multigenome dataset. It is also possible to call each genome to VCF and merge the VCF files but doing so the No-calls will be represented as Ref-calls which is not good and is what we get below with freebayes-parallel
.
NOTE: We use here bcftools mpileup
(with new arguments) rather than the old samtools mpileup
syntax used in older tutorials.
NOTE: Freebayes is used here as it is a very poular tool. However freebayes does not (yet) seem to produce gVCF to a level comparable to other tools, we are therefore producing separate VCF files here and merging them with bcftools. A particular issue is when a variant in one genome is a SNV while an indel is present in another genome. In such case, merging will not reflect the complexity of the situation and might lead to call issues.