Skip to content

Avoid the exception branch in expand #518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 21, 2025
Merged

Avoid the exception branch in expand #518

merged 4 commits into from
Jan 21, 2025

Conversation

vchuravy
Copy link
Member

No description provided.

Copy link
Contributor

github-actions bot commented Jan 20, 2025

Benchmark Results

main 2a4270e... main/2a4270e1998bb0...
saxpy/default/Float16/1024 0.73 ± 0.0094 μs 0.736 ± 0.0096 μs 0.991
saxpy/default/Float16/1048576 0.172 ± 0.0095 ms 0.174 ± 0.0089 ms 0.991
saxpy/default/Float16/16384 3.34 ± 0.043 μs 3.34 ± 0.039 μs 1
saxpy/default/Float16/2048 0.907 ± 0.013 μs 0.911 ± 0.011 μs 0.995
saxpy/default/Float16/256 0.582 ± 0.007 μs 0.59 ± 0.0069 μs 0.986
saxpy/default/Float16/262144 0.0441 ± 0.00042 ms 0.0443 ± 0.00046 ms 0.997
saxpy/default/Float16/32768 6.02 ± 0.062 μs 6.02 ± 0.084 μs 1
saxpy/default/Float16/4096 1.31 ± 0.026 μs 1.32 ± 0.026 μs 0.992
saxpy/default/Float16/512 0.643 ± 0.0092 μs 0.648 ± 0.0083 μs 0.992
saxpy/default/Float16/64 0.548 ± 0.0067 μs 0.556 ± 0.0068 μs 0.985
saxpy/default/Float16/65536 11.7 ± 0.13 μs 11.6 ± 0.15 μs 1
saxpy/default/Float32/1024 0.639 ± 0.011 μs 0.63 ± 0.0095 μs 1.01
saxpy/default/Float32/1048576 0.189 ± 0.023 ms 0.223 ± 0.026 ms 0.847
saxpy/default/Float32/16384 2.78 ± 0.23 μs 2.77 ± 0.12 μs 1
saxpy/default/Float32/2048 0.749 ± 0.05 μs 0.769 ± 0.048 μs 0.975
saxpy/default/Float32/256 0.561 ± 0.0062 μs 0.561 ± 0.0054 μs 0.999
saxpy/default/Float32/262144 0.0548 ± 0.0047 ms 0.0449 ± 0.0017 ms 1.22
saxpy/default/Float32/32768 5.53 ± 0.88 μs 5.26 ± 0.22 μs 1.05
saxpy/default/Float32/4096 1.14 ± 0.1 μs 1.13 ± 0.1 μs 1.01
saxpy/default/Float32/512 0.604 ± 0.0091 μs 0.596 ± 0.0075 μs 1.01
saxpy/default/Float32/64 0.547 ± 0.0061 μs 0.552 ± 0.0057 μs 0.992
saxpy/default/Float32/65536 12.7 ± 1 μs 11.8 ± 1 μs 1.08
saxpy/default/Float64/1024 0.747 ± 0.035 μs 0.769 ± 0.055 μs 0.97
saxpy/default/Float64/1048576 0.488 ± 0.035 ms 0.496 ± 0.024 ms 0.983
saxpy/default/Float64/16384 5.51 ± 0.78 μs 5.26 ± 0.29 μs 1.05
saxpy/default/Float64/2048 1.15 ± 0.098 μs 1.15 ± 0.094 μs 1
saxpy/default/Float64/256 0.582 ± 0.0077 μs 0.586 ± 0.0071 μs 0.993
saxpy/default/Float64/262144 0.107 ± 0.019 ms 0.114 ± 0.0079 ms 0.934
saxpy/default/Float64/32768 12.3 ± 1.1 μs 12.5 ± 0.64 μs 0.982
saxpy/default/Float64/4096 1.7 ± 0.24 μs 1.68 ± 0.12 μs 1.01
saxpy/default/Float64/512 0.636 ± 0.01 μs 0.637 ± 0.01 μs 0.999
saxpy/default/Float64/64 0.557 ± 0.007 μs 0.563 ± 0.0064 μs 0.988
saxpy/default/Float64/65536 23.3 ± 1.9 μs 28.6 ± 1.4 μs 0.816
saxpy/static workgroup=(1024,)/Float16/1024 2.17 ± 0.027 μs 2.16 ± 0.028 μs 1
saxpy/static workgroup=(1024,)/Float16/1048576 0.157 ± 0.0035 ms 0.159 ± 0.0083 ms 0.989
saxpy/static workgroup=(1024,)/Float16/16384 4.41 ± 0.07 μs 4.44 ± 0.14 μs 0.994
saxpy/static workgroup=(1024,)/Float16/2048 2.34 ± 0.03 μs 2.33 ± 0.027 μs 1
saxpy/static workgroup=(1024,)/Float16/256 2.8 ± 0.033 μs 2.82 ± 0.035 μs 0.996
saxpy/static workgroup=(1024,)/Float16/262144 0.042 ± 0.00092 ms 0.0422 ± 0.0012 ms 0.995
saxpy/static workgroup=(1024,)/Float16/32768 6.87 ± 0.16 μs 6.89 ± 0.26 μs 0.997
saxpy/static workgroup=(1024,)/Float16/4096 2.67 ± 0.041 μs 2.66 ± 0.036 μs 1
saxpy/static workgroup=(1024,)/Float16/512 3.25 ± 0.035 μs 3.27 ± 0.036 μs 0.996
saxpy/static workgroup=(1024,)/Float16/64 2.51 ± 0.22 μs 2.53 ± 0.2 μs 0.992
saxpy/static workgroup=(1024,)/Float16/65536 12.6 ± 0.27 μs 12.7 ± 0.55 μs 0.991
saxpy/static workgroup=(1024,)/Float32/1024 2.24 ± 0.032 μs 2.21 ± 0.038 μs 1.01
saxpy/static workgroup=(1024,)/Float32/1048576 0.235 ± 0.017 ms 0.213 ± 0.032 ms 1.11
saxpy/static workgroup=(1024,)/Float32/16384 4.41 ± 0.25 μs 4.44 ± 0.69 μs 0.993
saxpy/static workgroup=(1024,)/Float32/2048 2.39 ± 0.055 μs 2.37 ± 0.057 μs 1.01
saxpy/static workgroup=(1024,)/Float32/256 2.65 ± 0.053 μs 2.67 ± 0.042 μs 0.992
saxpy/static workgroup=(1024,)/Float32/262144 0.0592 ± 0.0037 ms 0.0479 ± 0.0035 ms 1.24
saxpy/static workgroup=(1024,)/Float32/32768 7.5 ± 0.36 μs 7.37 ± 0.5 μs 1.02
saxpy/static workgroup=(1024,)/Float32/4096 2.69 ± 0.11 μs 2.66 ± 0.079 μs 1.01
saxpy/static workgroup=(1024,)/Float32/512 2.71 ± 0.085 μs 2.72 ± 0.093 μs 0.996
saxpy/static workgroup=(1024,)/Float32/64 2.83 ± 5.4 μs 2.71 ± 5.5 μs 1.04
saxpy/static workgroup=(1024,)/Float32/65536 15.8 ± 1.6 μs 14.9 ± 1.6 μs 1.07
saxpy/static workgroup=(1024,)/Float64/1024 2.33 ± 0.065 μs 2.31 ± 0.051 μs 1.01
saxpy/static workgroup=(1024,)/Float64/1048576 0.527 ± 0.029 ms 0.497 ± 0.04 ms 1.06
saxpy/static workgroup=(1024,)/Float64/16384 7.24 ± 0.4 μs 7.45 ± 1.1 μs 0.971
saxpy/static workgroup=(1024,)/Float64/2048 2.61 ± 0.088 μs 2.59 ± 0.082 μs 1.01
saxpy/static workgroup=(1024,)/Float64/256 2.68 ± 0.078 μs 2.68 ± 0.11 μs 0.998
saxpy/static workgroup=(1024,)/Float64/262144 0.092 ± 0.0076 ms 0.117 ± 0.0081 ms 0.784
saxpy/static workgroup=(1024,)/Float64/32768 14.6 ± 1.4 μs 15.6 ± 1.2 μs 0.93
saxpy/static workgroup=(1024,)/Float64/4096 3.15 ± 0.16 μs 3.15 ± 0.21 μs 1
saxpy/static workgroup=(1024,)/Float64/512 2.65 ± 0.059 μs 2.64 ± 0.053 μs 1
saxpy/static workgroup=(1024,)/Float64/64 2.61 ± 0.072 μs 2.59 ± 0.066 μs 1.01
saxpy/static workgroup=(1024,)/Float64/65536 26.7 ± 2.7 μs 31.3 ± 1.5 μs 0.852
time_to_load 0.322 ± 0.0018 s 0.323 ± 0.0039 s 0.997

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@vchuravy vchuravy marked this pull request as ready for review January 21, 2025 13:33
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions could not be made:

  • src/nditeration.jl
    • lines 117-117

vchuravy and others added 4 commits January 21, 2025 14:53
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@vchuravy vchuravy merged commit 4585ca9 into main Jan 21, 2025
31 of 34 checks passed
@vchuravy vchuravy deleted the vc/avoid_trap branch January 21, 2025 14:08
vchuravy added a commit that referenced this pull request Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant