Array of structs vs struct of arrays #7544

EelcoHoogendoorn · 2021-08-07T15:15:43Z

EelcoHoogendoorn
Aug 7, 2021

Ive been writing code in JAX that involves length-3 vectors, and quaternions and the like. Ive been looking at jax/brax and the recently released alphafold for inspiration on best practices, and I noticed they take a different approach towards the implementation of for instance quaternion algebra operations. brax does what amounts to an array-of-structs, but here is the rationale behind alphafolds approach.

That is, they define a vector as Vecs = collections.namedtuple('Vecs', ['x', 'y', 'z']), rather than as simple a jax array of size 3, and write code involving such objects accordingly; that is if we want to sum over components, we need to write that out and we cannot call a sum method over an axis.

What we all want ofc is to be able to write high level expressive code, that will compile to something close to optimal on as many backends as possible. Its known that GPUs at least prefer a struct-of-arrays memory layout; as that will allow threads in a warp to perform the same operation, while their memory access patterns will coalesce.

Ideally, Id say a system like JAX could provide both expressiveness and performance; I can have arrays that contain many dimensions, including small dimension of size 3 or 4 or whatever; and during jitting JAX will decide for itself how to allocate those arrays, and what the strides for each axis should be. If my GPU prefers a struct-of-arrays type layout, so be it, and if my CPU prefers the fields in each object to be aligned together in memory, then we can compile it that way as well.

How close is current JAX to that ideal in practice though? Does it reason about the order of strides at all, or simply use C-order everywhere? Are there plans for getting the compiler involved in being smart about this? Is there any JAX style guide for best practices on this front?

EelcoHoogendoorn · 2021-08-10T13:21:37Z

EelcoHoogendoorn
Aug 10, 2021
Author

Use this colab to compare alphafold-style 3d matrix-vector products with einsum or the @ operator, and I found it to be about 2x faster on CPU, and 20% faster on GPU (TPU was having a flaky day as it often does for me and I didnt get meaningful numbers).

Somewhat surprised it matters more on CPU than GPU; and this is but one rather synthetic benchmark, but they are nontrivial differences nonetheless. I tried to gain some understanding if the memory layout of my device array matters at all; but it appears to me JAX/XLA current coerces all device arrays to c-contiguous anyway (as you typically would when programming GPUs cause thats what pretty much all library calls demand), since I cannot seem to make any difference by trying to input either c or fortran style device arrays; despite the fact that this should matter a lot, at least to the GPU.

2 replies

shoyer Aug 17, 2021
Collaborator

I agree that it would be nice if XLA could do these sorts of optimizations automatically. In practice, it does currently fall short in many cases.

In general, I would guess that "struct of arrays" is almost always going to be a more efficient representation than "array of structs", just based upon how CPU/GPUs work.

EelcoHoogendoorn Aug 18, 2021
Author

Well; if a CPU needs to load 4 floats in order to feed an MMX instruction, and these need to come from a struct-of-arrays, youd be getting less efficient use of a typical CPU cache I imagine... though not sure how that works out in practice; empirically the CPU indeed seems to prefer the struct-of-arrays in JAX. But otoh almost all legacy code written in c-style languages does infact use arrays-of-structs; are they all wrong? Is that because they prefer clean code over performance? Or is this performance characteristic something peculiar to JAX? On the GPU its common knowledge that struct-of-arrays is 10x superior in many real world scenarios, since without coalesced memory access you are just really screwed; so im a little surprised to see it not mattering a whole lot, really. Perhaps my 10yo GPU programming knowledge isnt applicable anymore nowadays and modern architecture hides these details pretty well? Or perhaps the code JAX omits is limited by other design constraints that prevent hitting anywhere near peak memory bandwidth anyway? Would be cool to be able to see the actual kernel code that JAX emits.

EelcoHoogendoorn · 2021-08-10T13:51:16Z

EelcoHoogendoorn
Aug 10, 2021
Author

Is there any method to check the strides of a device array by the way? None that I can find so far and google is pretty quiet on the topic as well. Would be nice to check some assumptions as to what is going on under the hood.

2 replies

shoyer Aug 17, 2021
Collaborator

Take a look at https://www.tensorflow.org/xla/shapes for a description of how arrays are represented in XLA.

In brief, XLA's API uses "contiguous" arrays, with some arbitrary order of axes. Unlike NumPy, you cannot have arbitrary strides. In practice, I think XLA often only uses row-major (C) order. For example, this seems to the case for custom call

Under the covers (i.e., inside jax.jit), in principle XLA has the freedom to use arbitrary strategies for representation arrays that aren't returned to the user. These are implementation details that aren't exposed to users.

EelcoHoogendoorn Aug 18, 2021
Author

Thanks, that does confirm my suspicion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Array of structs vs struct of arrays #7544

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Array of structs vs struct of arrays #7544

Uh oh!

EelcoHoogendoorn Aug 7, 2021

Replies: 2 comments · 4 replies

Uh oh!

EelcoHoogendoorn Aug 10, 2021 Author

Uh oh!

shoyer Aug 17, 2021 Collaborator

Uh oh!

Uh oh!

EelcoHoogendoorn Aug 18, 2021 Author

Uh oh!

EelcoHoogendoorn Aug 10, 2021 Author

Uh oh!

Uh oh!

shoyer Aug 17, 2021 Collaborator

Uh oh!

EelcoHoogendoorn Aug 18, 2021 Author

EelcoHoogendoorn
Aug 7, 2021

Replies: 2 comments 4 replies

EelcoHoogendoorn
Aug 10, 2021
Author

shoyer Aug 17, 2021
Collaborator

EelcoHoogendoorn Aug 18, 2021
Author

EelcoHoogendoorn
Aug 10, 2021
Author

shoyer Aug 17, 2021
Collaborator

EelcoHoogendoorn Aug 18, 2021
Author