-
Notifications
You must be signed in to change notification settings - Fork 29
Increasing coverage from 10x processing
Caleb Lareau
We consistently observe low-coverage regions in certain mtDNA regions due to homology with nuclear DNA. In the setting of scATAC-seq, we expect to observe significantly more reads truly generating from the mtDNA than the corresponding nuclear segment. Thus, the idea is to hard-mask regions of the nuclear genome that share consistent homology with the mtDNA, which in effect forces observed reads to the mtDNA when aligned.
Several common reference genomes have been provided already, but a workflow for how to
generate your own is discussed in the mitoblacklist repo.
The pre-existing blacklist .bed
files can be easily downloaded from there.
One can follow this code using the reference data from CellRanger 1.0.0 to produce a hard-mask modified reference genome.
One exception is that I couldn't figure out how to make the .flat
and .gdx
files,
nor could I figure out where they were important in the 10X protocol, but the pipeline
evidently does need them. So, I just keep them from the old version.
mv genome.fa old_genome.fa
mv genome.fa.flat old_genome.fa.flat
mv genome.fa.gdx old_genome.fa.gdx
bedtools maskfasta -fi old_genome.fa -bed mito_blacklist.bed -fo genome.fa
# Cleanup old files
rm old_genome.fa
rm genome.fa.*
# Make new reference genomes
bwa index genome.fa
samtools faidx genome.fa
# Put the flat files back; these won't be hard-masked, but they aren't being used by the aligner
mv old_genome.fa.flat genome.fa.flat
mv old_genome.fa.gdx genome.fa.gdx
# Update the timestamp to pass the CellRanger preflight checks
touch -d "now" genome.fa.flat
touch -d "now" genome.fa.gdx
Update: in version 1.2.0 of CellRanger-ATAC, the ability to make a full custom reference
genome is now possible.
Thus, a more natural way of generating this blacklist-compatible reference is to 1) generate the hard-masked .fa
file and
then 2) execute the cellranger-atac mkref
command as suggested.
Please raise an issue here