-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Open
Labels
broadcastApplying a function over a collectionApplying a function over a collectioncompiler:codegenGeneration of LLVM IR and native codeGeneration of LLVM IR and native codeperformanceMust go fasterMust go faster
Description
Looking a bit into why #54520 caused such a big latency improvement, the issue (or at least one issue) is that with the broadcast code a large majority of the time is spent in our own alloc optimization pass:
As can be seen, this pass runs four times, each time taking quite a long time. The time spent in this pass almost completely disappears when the broadcasting code is replaced with a loop (as was done in #54520).
It might be possible with some latency gains here if one can:
- Figure out the reason this pass seems to have such a problem with the broadcasting code
- Understanding if it really is required to run the pass four times.
Seelengrab
Metadata
Metadata
Assignees
Labels
broadcastApplying a function over a collectionApplying a function over a collectioncompiler:codegenGeneration of LLVM IR and native codeGeneration of LLVM IR and native codeperformanceMust go fasterMust go faster