True clonal vs Pseudo-clonal segments #2354

cjdjpj · 2025-04-18T00:42:53Z

cjdjpj
Apr 18, 2025

Hi there!

I am trying to detect, between any two samples, when a segment is truly clonally inherited verses when one of the two's segments is from a recombination/gene conversion event that just so happened to coalesce at the same time MRCA as the other sample. They would have the same ancestor node and Tmrca, but take different paths. My thought is that this could theoretically happen at high rates in a multiple merger coalescent.

I've thought about using ts.ibd_segments, which should recognize when one such "hidden gene conversion" is present, but it wouldn't be able to differentiate between clonal and these pseudo-clonal segments, since both possibilities have the same ancestor node.

I'm thinking incorporating extra gene conversion nodes could differentiate between the two, but don't know how to integrate them with ibd_segments. Is it possible? Am I missing something?

Any help is greatly appreciated.

Best,
CJ

Answered by cjdjpj

Apr 24, 2025

I thought it didn't work and but after testing it again it does!! Thank you for your help! It was helpful to know I was on the right track. I will close the discussion now.

Basically I noticed that only every second GC node is in the segment that was transferred, while all the other ones mark the complement segments of the genome (that don't come from gc). (Maybe it would be helpful to note this somewhere in the docs? The docs only talk about recombination nodes where it doesn't really matter which corresponds with left and right, while I had a hard time figuring out which GC nodes corresponded to the transferred segment, or if such a correspondance even existed).

So I basically walked up…

View full answer

petrelharp · 2025-04-20T04:33:28Z

petrelharp
Apr 20, 2025
Maintainer

Hi, CJ - to answer the question I think we need some more context? Is this a question about tree sequences produced by msprime?

If so, and you want to be able to always have separate edges for segments that took separate paths through the pedigree, then - I could be wrong, but - I think you need to use record_full_arg=True. Otherwise, what you say can definitely happen and won't be recorded separately.

(Mabye this should be an msprime discussion?)

4 replies

cjdjpj Apr 20, 2025
Author

Thank you! This got me looking in the right places.
Yes I should've mentioned that it is about tree sequences from msprime. Let me know if I should delete/move the discussion.

I simulated a dummy tree sequence with just 1 gene conversion event to see if I could figure out how to tell the difference. (where red nodes are gene conversion nodes)

In this example, node 6 and node 7 both mark a gene conversion event. But node 6 only marks it and nothing happens, while in node 7 now we're following the gene conversion segment up until it coalesces. Is there a way to tell which node does which? Here it is fairly clear that 5-6 is the GC segment, but is there a way to systematically determine this?

petrelharp Apr 23, 2025
Maintainer

I think your question is answered in this section? Let us know if not?

And - I'll move this over to msprime.

cjdjpj Apr 23, 2025
Author

I don't think so...

Ultimately, I just want to know between any two pairs of samples, how much of their genome has never underwent a gene converison event. In my example above, 0-5 and 6-10 of samples 0 and 4 fulfill this criteria, while 5-6 does not.

The docs describe well what the additional nodes are, but I still don't know how they can be used to do what I want.
I've tried many things with gene conversion nodes, and scoured the docs when none of those worked properly or just couldn't work, but I'll leave it all out since I think its just complicating my question and I'm not explaining it very well.

What would be the simple idea to do this?

petrelharp Apr 24, 2025
Maintainer

Ah, I see; that makes sense. And, let's see - my email notifications tell me that you wrote and deleted a comment with what looks to me like a great start on answering this question; could you maybe provide some more of the context that you posted in that deleted answer? And perhaps the code, unless it just doesn't work? The main reason I'm asking is because what I would have set out to do is to write just the sort of code you've written already.

cjdjpj · 2025-04-24T14:49:31Z

cjdjpj
Apr 24, 2025
Author

I thought it didn't work and but after testing it again it does!! Thank you for your help! It was helpful to know I was on the right track. I will close the discussion now.

Basically I noticed that only every second GC node is in the segment that was transferred, while all the other ones mark the complement segments of the genome (that don't come from gc). (Maybe it would be helpful to note this somewhere in the docs? The docs only talk about recombination nodes where it doesn't really matter which corresponds with left and right, while I had a hard time figuring out which GC nodes corresponded to the transferred segment, or if such a correspondance even existed).

So I basically walked up the tree and looked for these nodes.
This was the code I wrote before:

GENE_CONVERSION_FLAG = 1 << 21

gc_nodes = [
    node.id for node in ts.nodes() if node.flags == GENE_CONVERSION_FLAG
]

real_gc_nodes = set(gc_nodes[1::2]) # gc nodes from the transferred segment (only ever other node)

def are_clonal(i, j, real_gc_nodes, tree):
    mrca = tree.mrca(i,j)

    # no real gc node on path from i to mrca
    while i != mrca:
        if i in real_gc_nodes:
            return False
        i = tree.parent(i)

    # no real gc node on path from j to mrca
    while j != mrca:
        if j in real_gc_nodes:
            return False
        j = tree.parent(j)

    return True

frac_trueclonal = []

# how much of each pair of genomes is clonal?
for (i, j) in pairs:
    clonal = sum(tree.span for tree in ts.trees() if are_clonal(i, j, real_gc_nodes, tree))
    frac_trueclonal.append(clonal/ts.sequence_length)

Walking up the tree for every pair and every tree turned out to be way too slow for large amonts of GC, so I instead converted each tree into a graph, chopped it up at these real GC nodes, and each pair within the components created by chopping up the tree must be clonal, so I'd add the tree's span to that pair's "clonal fraction". This is much faster and does the exact same.

n = mts.num_samples
L = mts.sequence_length
GENE_CONVERSION_FLAG = 1 << 21

gc_nodes = [u.id for u in mts.nodes() if u.flags == GENE_CONVERSION_FLAG]
real_gc = set(gc_nodes[1::2])

num_pairs = math.comb(n, 2)
span = np.zeros(num_pairs, dtype=float)

# for each marginal tree, find components when you "remove" real_gc nodes (building a graph excluding them)
for tree in mts.trees():
    g = rx.PyGraph()

    node_map = {} # get graph idx for node idx

    for u in tree.nodes():
        if u not in real_gc:
            node_index = g.add_node(u)
            node_map[u] = node_index

    for u in node_map:
        p = tree.parent(u)
        if p != tskit.NULL and p in node_map:
            g.add_edge(node_map[u], node_map[p], None)

    components = rx.connected_components(g)

    # for all pairs in component, add tree span as clonal
    for comp in components:
        sample_nodes = [g[i] for i in comp if tree.is_sample(g[i])]
        for i, j in combinations(sorted(sample_nodes), 2):
            pair_index = int(j - i - 1 + n * i - (i * (i + 1)) // 2)
            span[pair_index] += tree.span

frac_trueclonal = span/L

1 reply

petrelharp Apr 25, 2025
Maintainer

Yay! Glad to hear it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

True clonal vs Pseudo-clonal segments #2354

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

True clonal vs Pseudo-clonal segments #2354

Uh oh!

Uh oh!

cjdjpj Apr 18, 2025

Replies: 2 comments · 5 replies

Uh oh!

petrelharp Apr 20, 2025 Maintainer

Uh oh!

Uh oh!

cjdjpj Apr 20, 2025 Author

Uh oh!

petrelharp Apr 23, 2025 Maintainer

Uh oh!

cjdjpj Apr 23, 2025 Author

Uh oh!

petrelharp Apr 24, 2025 Maintainer

Uh oh!

cjdjpj Apr 24, 2025 Author

Uh oh!

petrelharp Apr 25, 2025 Maintainer

cjdjpj
Apr 18, 2025

Replies: 2 comments 5 replies

petrelharp
Apr 20, 2025
Maintainer

cjdjpj Apr 20, 2025
Author

petrelharp Apr 23, 2025
Maintainer

cjdjpj Apr 23, 2025
Author

petrelharp Apr 24, 2025
Maintainer

cjdjpj
Apr 24, 2025
Author

petrelharp Apr 25, 2025
Maintainer