Skip to content

Support JLD2 for (de)serialization #2789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lassepe
Copy link

@lassepe lassepe commented May 28, 2025

Currently, serializing a CUDA array with JLD2 will silently fail because it serializes a pointer that is no longer valid when deserializing. This PR adds an extension for JLD2 that handles this more gracefully by hooking into the JLD2.writeas mechanism.

Copy link
Contributor

github-actions bot commented May 28, 2025

Your PR no longer requires formatting changes. Thank you for your contribution!

@lassepe lassepe force-pushed the feature/jld2_extension branch from e53d82b to 915565a Compare May 28, 2025 09:04
@lassepe
Copy link
Author

lassepe commented May 28, 2025

Buildkite failure seems unrelated.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 915565a Previous: 1004fd9 Ratio
latency/precompile 43090085487 ns 43347615330 ns 0.99
latency/ttfp 7157885101 ns 7144432561 ns 1.00
latency/import 3445460616 ns 3422583362 ns 1.01
integration/volumerhs 9608976 ns 9621539.5 ns 1.00
integration/byval/slices=1 147297.5 ns 147204 ns 1.00
integration/byval/slices=3 425946 ns 426126.5 ns 1.00
integration/byval/reference 145338 ns 145306 ns 1.00
integration/byval/slices=2 286695.5 ns 286683 ns 1.00
integration/cudadevrt 103611 ns 103645 ns 1.00
kernel/indexing 14400 ns 14297 ns 1.01
kernel/indexing_checked 15055 ns 14899 ns 1.01
kernel/occupancy 715.6597222222222 ns 742.9652777777778 ns 0.96
kernel/launch 2247.6666666666665 ns 2214.8888888888887 ns 1.01
kernel/rand 15456 ns 18344.5 ns 0.84
array/reverse/1d 20207 ns 19882 ns 1.02
array/reverse/2d 25271 ns 23850 ns 1.06
array/reverse/1d_inplace 11469 ns 10508 ns 1.09
array/reverse/2d_inplace 13076 ns 12080 ns 1.08
array/copy 21159 ns 21206 ns 1.00
array/iteration/findall/int 160004.5 ns 159710.5 ns 1.00
array/iteration/findall/bool 139875 ns 139788 ns 1.00
array/iteration/findfirst/int 164540.5 ns 163426.5 ns 1.01
array/iteration/findfirst/bool 165590 ns 164898.5 ns 1.00
array/iteration/scalar 73700 ns 73632 ns 1.00
array/iteration/logical 218991.5 ns 219076 ns 1.00
array/iteration/findmin/1d 47585 ns 47341 ns 1.01
array/iteration/findmin/2d 97654 ns 97310 ns 1.00
array/reductions/reduce/1d 37152 ns 36812 ns 1.01
array/reductions/reduce/2d 41874 ns 41111 ns 1.02
array/reductions/mapreduce/1d 35265 ns 35000.5 ns 1.01
array/reductions/mapreduce/2d 41376 ns 41081 ns 1.01
array/broadcast 21459 ns 21375 ns 1.00
array/copyto!/gpu_to_gpu 12714 ns 12932 ns 0.98
array/copyto!/cpu_to_gpu 218545 ns 217571 ns 1.00
array/copyto!/gpu_to_cpu 286037 ns 286553 ns 1.00
array/accumulate/1d 109547 ns 109198 ns 1.00
array/accumulate/2d 81383 ns 81190 ns 1.00
array/construct 1256.3 ns 1248.9 ns 1.01
array/random/randn/Float32 44210 ns 44976.5 ns 0.98
array/random/randn!/Float32 25252 ns 25430 ns 0.99
array/random/rand!/Int64 27332 ns 27147 ns 1.01
array/random/rand!/Float32 8805 ns 8919.333333333334 ns 0.99
array/random/rand/Int64 30361 ns 30142 ns 1.01
array/random/rand/Float32 13382 ns 13307 ns 1.01
array/permutedims/4d 61774 ns 61252 ns 1.01
array/permutedims/2d 55763 ns 55576 ns 1.00
array/permutedims/3d 56487 ns 56240 ns 1.00
array/sorting/1d 2765868 ns 2778491 ns 1.00
array/sorting/by 3370345 ns 3369601 ns 1.00
array/sorting/2d 1087106 ns 1087382 ns 1.00
cuda/synchronization/stream/auto 1049.7 ns 1018.7 ns 1.03
cuda/synchronization/stream/nonblocking 7255.6 ns 8223 ns 0.88
cuda/synchronization/stream/blocking 830.5591397849462 ns 800.9072164948453 ns 1.04
cuda/synchronization/context/auto 1172.5 ns 1159.8 ns 1.01
cuda/synchronization/context/nonblocking 7133.1 ns 7478.9 ns 0.95
cuda/synchronization/context/blocking 921.1160714285714 ns 897.4426229508197 ns 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant