Skip to content

Indels can cause spurious site production #27

@bricoletc

Description

@bricoletc

This is linked to #15 as it leads to production of ambiguous prgs (multiple paths give rise to same sequence)

Clustering code can conclude that there is no meaningful clustering of a set of sequences- e.g. puts each sequence in one cluster.

However the code that calls clustering can re-run prg-building for sets of sequences that only differ in alignment (i.e. gap positioning), not sequence (here)

This leads to spurious 'nested variants' and the following pathological example case:
msp6_ambig.pdf

Of the 4 paths between nodes labeled 55 and 56, two are identical. This cause gramtools to mis-genotype down the line.

I have a fix that I'm implementing and will PR in

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions