batch_align.py loads up the whole query fasta into RAM

See https://github.com/karel-brinda/mof-search/blob/e8e681b67538c3eadff2e577581a36183cd27303/scripts/batch_align.py#L150-L154

This clearly does not scale well when the query fasta is massive (e.g. read sets). One easy and quick way to save a bit more RAM is to just load queries that map to the given batch. Should I implement this @karel-brinda , as it is pretty quick to do? Of course if the whole or most of the read set still maps to the batch, we will still load lots of things. Only way through this I think is to create a fasta index on the query fasta and load only the fasta IDs, with the sequences being loaded from the disk by demand...

This does not matter much if mof-search use case does not concern read sets mapping, which is what I thought from the beginning, but I know you've been mapping ONT datasets with it...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

batch_align.py loads up the whole query fasta into RAM #225

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

batch_align.py loads up the whole query fasta into RAM #225

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions