Clarification on the OP_CPY operation src0->src1 #1314

josemonsalve2 · 2025-07-24T23:56:04Z

josemonsalve2
Jul 24, 2025

Hi,

I was reviewing the generated graph for Llama 4, and there appears to be an issue with the implementation of OP_CPY.

In this line, it suggests that this is src0 -> src1, and dst is not really used.

case GGML_OP_CPY: {
    // cpy overwrites value of src1 by src0 and returns view(src1)
    // the overwriting is mathematically equivalent to:
    // tensor = src0 * 1 + src1 * 0
    if (src0_needs_grads) {
        // dsrc0 = dtensor * 1
        ggml_add_or_set(ctx, cgraph, isrc0, ggml_reshape(ctx, grad, src0));
    }
    if (src1_needs_grads) {
        // dsrc1 = dtensor * 0 -> noop
    }
}

When looking at the resulting graph, there is a dependency that does not seem to be realized:

The red line is an implicit dependency. The blue node has no output dependencies.

I am curious why this was designed this way, rather than having src0 -> dst.

I can imagine that this works better (more efficiently) because this results in a view, rather than an extra copy. But when doing dependency analysis, this gets in the way (the dependency is never realized). This is currently not an issue, as the order of the tensor ID ensures the implicit dependency (i.e., in the figure, the copy is ID 14, while the consumer is ID 20); therefore, during evaluation, 14 and 20 are never executed out of order or in parallel.

Any insights here would be appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on the OP_CPY operation src0->src1 #1314

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Clarification on the OP_CPY operation src0->src1 #1314

Uh oh!

josemonsalve2 Jul 24, 2025

Replies: 0 comments

josemonsalve2
Jul 24, 2025