Version 0.8.0 Beta 2: nvFatbin support, CUDA 12.x features, sync-async op unification, etc. #697
eyalroz
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Changes since v0.7.1:
Support for the
nvFatbin
library (#681)cuda::fatbin_builder_t
class: One creates a builder, adds various fragments of fatbin-contained content (cubin, PTX, LTO IR etc.), then finally uses thebuild()
orbuild_at()
method to obtain the completed, final, tabin file data, in a region of memory.cuda-api-wrappers::fatbin
, which one should depend on when actually using the builder.Support for more CUDA 12.x features
cuda::module_t
, with the methodunique_span<kernel_t> get_kernels() const
.kernel_t::mangled_name()
(regards Make mangled and unmangled (kernel) names more difficult to mix up #674)(Note these features are not accessible if you're using the wrappers with CUDA 11.x)
More
unique_span
class changesLike a recently-cut gem, one slowly polishes it until it gains it shines brightly... we had some work on unique_span in version 0.7.1 as well, and it continues:
swap()
implementationT
to a span ofconst T
.release()
, nor our move construction, can benoexcept
- removed that marking based only on optimismoptional_ref
& partial unification of async and non-async memory operationsoptional_ref
class, for passing optional arguments which are references. See this blog post by Foonathan about the problems of putting references in C++ optional's.cuda::memory::foo()
andcuda::memory::async::foo()
variants now have a single variant,cuda::memory::foo()
, which takes an extraoptional_ref<stream_t>
parameter: When it's not set, it's a synchronous(ish) operation; when it is set - the operation is asynchronous and scheduled on the stream.copy_single()
had disagreed - one took a pointer, the other a reference. With their unification, they now agree (and take a pointer).Bug fixes
The poor man's optional class
value_or()
now returns a value...value_or()
is now constIn example programs
Other changes
Build mechanism
In the wrapper APIs themselves
size_t
's inlaunch_config_builder_t
's methods - so as to prevent narrowing-cast warnings and checking limits ourselves.In example programs
This discussion was created from the release Version 0.8.0 Beta 2: nvFatbin support, CUDA 12.x features, sync-async op unification, etc..
Beta Was this translation helpful? Give feedback.
All reactions