Skip to content

Read mapping is very slow on diploid human genome assembly #28

@jeizenga

Description

@jeizenga

I tried to use VerityMap to validate a diploid human genome assembly using HiFi reads, but on my data it was too slow to be practical. I let it run for >3 weeks one 16 threads, and it only mapped up to about 4x. Is this speed expected? Are there any tweaks I can make to increase it?

The command I ran was

python3 main.py --reads reads.fastq.gz -o verity_map_output -t 16 -d hifi-diploid \
    assembly.haplotype1.fasta assembly.haplotype2.fasta

Another question/request: I understand from the paper that VerityMap also includes analysis modules to detect the location of misassemblies. As far as I can see, these can only be accessed after read mapping concludes (I believe the relevant code is here). Is this correct? It would be useful if the interface allowed a more modular option that could be run independently of mapping, especially since it seems like I will need to troubleshoot the mapping stage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions