Clarification on the OP_CPY operation src0->src1 #1314
Unanswered
josemonsalve2
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I was reviewing the generated graph for Llama 4, and there appears to be an issue with the implementation of OP_CPY.
In this line, it suggests that this is
src0 -> src1
, and dst is not really used.When looking at the resulting graph, there is a dependency that does not seem to be realized:
The red line is an implicit dependency. The blue node has no output dependencies.
I am curious why this was designed this way, rather than having src0 -> dst.
I can imagine that this works better (more efficiently) because this results in a view, rather than an extra copy. But when doing dependency analysis, this gets in the way (the dependency is never realized). This is currently not an issue, as the order of the tensor ID ensures the implicit dependency (i.e., in the figure, the copy is ID 14, while the consumer is ID 20); therefore, during evaluation, 14 and 20 are never executed out of order or in parallel.
Any insights here would be appreciated.
Beta Was this translation helpful? Give feedback.
All reactions