New feature of index ordering for optimized contractions ? #56

JMartinon · 2025-06-06T21:45:35Z

JMartinon
Jun 6, 2025

Hello,
I'm currently working on a C++ project where I plan to use Cotengra to generate an optimized contraction tree and then perform the contractions with the library TBLIS.
Using the tree.print_contractions() I get the cascade of required binary contractions, as well as details on the indices, to perform the contraction of my network. Let's look at the following example appearing in my contraction tree:

(3) cost: 2.0e+10 widths: 30.0,20.2->30.0 type: einsum
inputs: {x}alb{k}[g]c{mn},{x}h[g]{kmn}->
output: {x}alb{k}c{mn}(h)

So here the index [g] is contracted while the indices {xkmn} are batched. TBLIS would be happy to directly deal with this index ordering (and that's why this library is great):

tblis::mult(1.0, input1, "xalbkgcmn", input2, "xhgkmn", 0.0, output, "xalbkcmnh");

But we can consider a more general GEMM approach. Either way, this data layout is far from being optimal.
I'm going to consider column-major layout (so leftmost continuous). In the upper example the index [g] is not continuous in either the left nor the right input tensor. Furthermore one batch index {x} is the one continuous.
So instead of having

T1[xalbkgcmn] · T2[xhgkmn] = T3[xalbkcmnh]

if we consider the following layout

T1[gabclxkmn] · T2[hgxkmn] = T3[habclxkmn]

we have the four batch indices {xkmn} being the outermost one, reducing the calculation to a matrix like product:

T1[(g,abclx){kmn}] · T2[(h,g){xkmn}]

with MxK·KxN = MxN where M=h, K=g and N=abcl, leading to a major boost of performance.

Note that it's a coarse approach, a more refine analysis of the size of the dimensions would be needed to reach peak performance

This is a simple approach and can be easily automated. But here's the catch: in the above example the tensor T2 is an input tensor of my network, hence I can fill its data with the layout I want. But T1 is an intermediate tensor, generated as a result of a previous contraction. Hence, its data layout depends on the previous contraction. So if we optimize the index ordering of this contraction it will constrain the index ordering of the previous one (and next one if T1 is reused), and so on so forth.
So here we have to sacrifice good ordering for small contraction in order to have optimal ordering for huge contractions, loosing a bit of time to save a lot of time. A more refined analysis would be required to evaluate the overall best set of index orderings.

It would be great to be able to feed the tensor network to Cotengra, allow freedom in the input tensors ordering, and then have the library find the optimized set of index orderings, something like tree.print_contractions(opt_indices=COL_MAJOR) !

Do you think such feature is worth considering ? Thank you very much for your work !

jcmgray · 2025-06-06T22:02:36Z

jcmgray
Jun 6, 2025
Maintainer

Hi @JMartinon, thanks for the detailed discussion example. cotengra does actually I think have a similar function sort_contraction_indices that I believe tries to do this same thing roughly. I.e. it tries to make the indices contiguous for the largest/most expensive contraction, then moves onto the smaller ones etc. Maybe it can be adapted for your needs.

I haven't spent much time on it because I didn't find it made much difference, but that is likely because I usually use a backend when the contraction is performed as transpose + batch GEMM.

2 replies

JMartinon Jun 6, 2025
Author

Wow now I look stupid ! Thank you so much for the very quick reply ! I don't think my case is specific in any way.
How does the function work tho ?
I don't see any row vs col-major option, is the data layout assumed to be one of those two ?
Thanks again !

jcmgray Jun 6, 2025
Maintainer

Probably looking at the source code will be clearer than my memory, but it essentially attempts to rorder indices such that grouping them into batch/left/right/contracted etc can be directly interpreted as matrix multiplication. It processes the contractions in some priority, such that higher priority ones (which might reorder their inputs) take precedence.

I haven't really thought about it on the level of c vs f ordering, I guess numpy is c-style by default, but you could add other objectives options that sort differently. Like I say, I didn't really produce any meaningful general speedups, but the general scaffolding is there if you wanted to experiment!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New feature of index ordering for optimized contractions ? #56

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

New feature of index ordering for optimized contractions ? #56

Uh oh!

JMartinon Jun 6, 2025

Replies: 1 comment · 2 replies

Uh oh!

jcmgray Jun 6, 2025 Maintainer

Uh oh!

JMartinon Jun 6, 2025 Author

Uh oh!

jcmgray Jun 6, 2025 Maintainer

JMartinon
Jun 6, 2025

Replies: 1 comment 2 replies

jcmgray
Jun 6, 2025
Maintainer

JMartinon Jun 6, 2025
Author

jcmgray Jun 6, 2025
Maintainer