Replies: 1 comment 2 replies
-
Hi @JMartinon, thanks for the detailed discussion example. I haven't spent much time on it because I didn't find it made much difference, but that is likely because I usually use a backend when the contraction is performed as transpose + batch GEMM. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm currently working on a C++ project where I plan to use Cotengra to generate an optimized contraction tree and then perform the contractions with the library TBLIS.
Using the
tree.print_contractions()
I get the cascade of required binary contractions, as well as details on the indices, to perform the contraction of my network. Let's look at the following example appearing in my contraction tree:So here the index
[g]
is contracted while the indices{xkmn}
are batched. TBLIS would be happy to directly deal with this index ordering (and that's why this library is great):But we can consider a more general GEMM approach. Either way, this data layout is far from being optimal.
I'm going to consider column-major layout (so leftmost continuous). In the upper example the index
[g]
is not continuous in either the left nor the right input tensor. Furthermore one batch index{x}
is the one continuous.So instead of having
if we consider the following layout
we have the four batch indices
{xkmn}
being the outermost one, reducing the calculation to a matrix like product:with
MxK·KxN = MxN
whereM=h
,K=g
andN=abcl
, leading to a major boost of performance.This is a simple approach and can be easily automated. But here's the catch: in the above example the tensor T2 is an input tensor of my network, hence I can fill its data with the layout I want. But T1 is an intermediate tensor, generated as a result of a previous contraction. Hence, its data layout depends on the previous contraction. So if we optimize the index ordering of this contraction it will constrain the index ordering of the previous one (and next one if T1 is reused), and so on so forth.
So here we have to sacrifice good ordering for small contraction in order to have optimal ordering for huge contractions, loosing a bit of time to save a lot of time. A more refined analysis would be required to evaluate the overall best set of index orderings.
It would be great to be able to feed the tensor network to Cotengra, allow freedom in the input tensors ordering, and then have the library find the optimized set of index orderings, something like
tree.print_contractions(opt_indices=COL_MAJOR)
!Do you think such feature is worth considering ? Thank you very much for your work !
Beta Was this translation helpful? Give feedback.
All reactions