Skip to content

Backports for KernelAbstractions 0.9.35 #611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 10, 2025

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Jun 10, 2025

nsajko and others added 2 commits June 10, 2025 15:01
Co-authored-by: Valentin Churavy <v.churavy@gmail.com>
(cherry picked from commit 474050e)
Copy link
Contributor

github-actions bot commented Jun 10, 2025

Benchmark Results

main 69198a1... main / 69198a1...
saxpy/default/Float32/1024 0.0565 ± 0.029 ms 0.648 ± 0.011 μs 87.2 ± 46
saxpy/default/Float32/1048576 0.455 ± 0.021 ms 0.262 ± 0.027 ms 1.74 ± 0.19
saxpy/default/Float32/16384 0.0582 ± 0.029 ms 3.07 ± 0.92 μs 19 ± 11
saxpy/default/Float32/2048 0.0453 ± 0.027 ms 0.76 ± 0.069 μs 59.5 ± 37
saxpy/default/Float32/256 0.0623 ± 0.029 ms 0.574 ± 0.0063 μs 109 ± 50
saxpy/default/Float32/262144 0.157 ± 0.026 ms 0.0572 ± 0.0045 ms 2.75 ± 0.5
saxpy/default/Float32/32768 0.0636 ± 0.028 ms 6.1 ± 1.5 μs 10.4 ± 5.2
saxpy/default/Float32/4096 0.0481 ± 0.029 ms 1.12 ± 0.071 μs 43.2 ± 26
saxpy/default/Float32/512 0.0649 ± 0.028 ms 0.61 ± 0.0095 μs 106 ± 46
saxpy/default/Float32/64 0.0491 ± 0.029 ms 0.563 ± 0.0064 μs 87.2 ± 52
saxpy/default/Float32/65536 0.0754 ± 0.028 ms 13.4 ± 1.4 μs 5.63 ± 2.2
saxpy/default/Float64/1024 0.0516 ± 0.029 ms 0.761 ± 0.035 μs 67.8 ± 39
saxpy/default/Float64/1048576 0.591 ± 0.082 ms 0.535 ± 0.068 ms 1.11 ± 0.21
saxpy/default/Float64/16384 0.0673 ± 0.028 ms 5.67 ± 1.2 μs 11.9 ± 5.5
saxpy/default/Float64/2048 0.0438 ± 0.026 ms 1.12 ± 0.082 μs 39 ± 23
saxpy/default/Float64/256 0.062 ± 0.029 ms 0.59 ± 0.0063 μs 105 ± 49
saxpy/default/Float64/262144 0.169 ± 0.026 ms 0.117 ± 0.012 ms 1.45 ± 0.27
saxpy/default/Float64/32768 0.0683 ± 0.027 ms 13 ± 1.3 μs 5.25 ± 2.1
saxpy/default/Float64/4096 0.0492 ± 0.027 ms 1.74 ± 0.25 μs 28.3 ± 16
saxpy/default/Float64/512 0.0603 ± 0.029 ms 0.643 ± 0.011 μs 93.8 ± 46
saxpy/default/Float64/64 0.0551 ± 0.03 ms 0.564 ± 0.0053 μs 97.7 ± 54
saxpy/default/Float64/65536 0.0867 ± 0.027 ms 28.8 ± 2.9 μs 3.01 ± 0.98
saxpy/static workgroup=(1024,)/Float32/1024 0.0433 ± 0.029 ms 2.12 ± 0.03 μs 20.5 ± 14
saxpy/static workgroup=(1024,)/Float32/1048576 0.45 ± 0.02 ms 0.248 ± 0.024 ms 1.81 ± 0.19
saxpy/static workgroup=(1024,)/Float32/16384 0.0536 ± 0.027 ms 4.62 ± 0.75 μs 11.6 ± 6.1
saxpy/static workgroup=(1024,)/Float32/2048 0.0429 ± 0.026 ms 2.26 ± 0.047 μs 19 ± 12
saxpy/static workgroup=(1024,)/Float32/256 0.0558 ± 0.027 ms 2.61 ± 0.041 μs 21.4 ± 10
saxpy/static workgroup=(1024,)/Float32/262144 0.158 ± 0.027 ms 0.0602 ± 0.0053 ms 2.62 ± 0.5
saxpy/static workgroup=(1024,)/Float32/32768 0.0589 ± 0.026 ms 7.86 ± 1 μs 7.5 ± 3.5
saxpy/static workgroup=(1024,)/Float32/4096 0.0482 ± 0.028 ms 2.54 ± 0.067 μs 18.9 ± 11
saxpy/static workgroup=(1024,)/Float32/512 0.06 ± 0.028 ms 2.63 ± 0.037 μs 22.8 ± 11
saxpy/static workgroup=(1024,)/Float32/64 0.0539 ± 0.027 ms 2.63 ± 5.1 μs 20.5 ± 41
saxpy/static workgroup=(1024,)/Float32/65536 0.073 ± 0.026 ms 15.6 ± 1.4 μs 4.67 ± 1.7
saxpy/static workgroup=(1024,)/Float64/1024 0.0441 ± 0.03 ms 2.25 ± 0.053 μs 19.6 ± 13
saxpy/static workgroup=(1024,)/Float64/1048576 0.572 ± 0.064 ms 0.56 ± 0.042 ms 1.02 ± 0.14
saxpy/static workgroup=(1024,)/Float64/16384 0.063 ± 0.026 ms 7.69 ± 0.96 μs 8.19 ± 3.6
saxpy/static workgroup=(1024,)/Float64/2048 0.0427 ± 0.026 ms 2.52 ± 0.066 μs 16.9 ± 10
saxpy/static workgroup=(1024,)/Float64/256 0.0622 ± 0.027 ms 2.6 ± 0.056 μs 24 ± 10
saxpy/static workgroup=(1024,)/Float64/262144 0.169 ± 0.026 ms 0.102 ± 0.014 ms 1.67 ± 0.35
saxpy/static workgroup=(1024,)/Float64/32768 0.0668 ± 0.026 ms 15.7 ± 1.4 μs 4.25 ± 1.7
saxpy/static workgroup=(1024,)/Float64/4096 0.048 ± 0.026 ms 3.19 ± 0.22 μs 15.1 ± 8.3
saxpy/static workgroup=(1024,)/Float64/512 0.0585 ± 0.029 ms 2.6 ± 0.064 μs 22.5 ± 11
saxpy/static workgroup=(1024,)/Float64/64 0.0581 ± 0.027 ms 2.56 ± 0.067 μs 22.7 ± 11
saxpy/static workgroup=(1024,)/Float64/65536 0.0885 ± 0.027 ms 29.1 ± 5.9 μs 3.04 ± 1.1
time_to_load 1.37 ± 0.0047 s 0.319 ± 0.004 s 4.31 ± 0.056

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

* Use Float32 in examples with backends that don't support Float64

* Add `backend` argument for `examples_testset`

* Reduce `TILE_DIM` for compatibility

Metal doesn't always support 1-24 threads, which causes intermittent errors with 32x32 tiles

* Fix histogram implementation

The final part of the loop expects every thread to exists, so we cannot not launch them. Avoid work on extra threads until then.

Also use Int32 since some backends lack Int64 atomics, and make one of the tests have weird groupsize since that's when the errors used to pop up.

(cherry picked from commit dab03b9)
@vchuravy vchuravy requested a review from christiangnrd June 10, 2025 13:54
Copy link
Member

@christiangnrd christiangnrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vchuravy vchuravy merged commit f7a37d0 into release-0.9 Jun 10, 2025
33 of 37 checks passed
@vchuravy vchuravy deleted the vc/backports-release-0.9 branch June 10, 2025 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants