Skip to content

Casanovo-DB potential performance improvements #402

@bittremieux

Description

@bittremieux
  • Should give a warning when non-supported PTMs are used (e.g. C carbamidomethylation is not specified as a fixed modification) [easy]
  • Multithreading during FASTA digestion and m/z calculation [easy]
  • Refactor candidate selection to not search from scratch for every candidate but use some sort of looping index [moderate (within batches) — hard (across batches, would require modifications to the data loader)]
  • mzTab export should report protein database information [easy]
  • Spectrum progress bar should show more granular updates [moderate]
  • Superfluous predicted peptide m/z calculation (can be derived from database) [moderate]

Open-ended evaluations:

  • Profiling to understand where the runtime and memory consumption bottlenecks are—likely PSM batch creation contributes. Candidate retrieval can probably be optimized using a sliding window approach.
  • Investigate whether _calc_match_score can be harmonized between de novo and DB modes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions