-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Support for runtime CPU/CUDA selection #3060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Nice, let me test this a bit and look it over. |
Hey @davisking! Do you have any feedback or get a chance to test this code? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces runtime support for selecting between CPU and CUDA implementations in the dnn module by adding new functions (use_cuda() and set_use_cuda(bool)) and wrapping key tensor operation calls with runtime branches.
- In test code (dlib/test/dnn.cpp), additional calls to set_use_cuda(true) and set_use_cuda(false) have been added to drive the runtime behavior.
- In dlib/cuda/tensor_tools.h, the preprocessor‐guarded calls have been refactored so that calls to CUDA functions are conditionally executed based on the current runtime flag while still preserving compatibility when CUDA is not linked.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
File | Description |
---|---|
dlib/test/dnn.cpp | Added calls to set_use_cuda(true)/false to force the runtime branch for CUDA tests. |
dlib/cuda/tensor_tools.h | Refactored tensor tool functions to select between cuda_impl and cpu_impl dynamically at runtime. |
Comments suppressed due to low confidence (2)
dlib/test/dnn.cpp:5029
- Consider adding an inline comment or an explicit assertion to verify that the call to set_use_cuda(true) correctly configures the runtime behavior for CUDA functions, and that a corresponding call to set_use_cuda(false) properly resets the state after the GPU-specific tests.
set_use_cuda(true);
dlib/cuda/tensor_tools.h:24
- Consider adding unit tests specifically targeting the new runtime selection API (use_cuda() and set_use_cuda(bool)) to ensure they return the expected values in different scenarios, thereby validating that the dynamic dispatch in tensor tool functions works as intended.
bool use_cuda();
Throw an error when image width or height is 0. Fixes mozilla/mozjpeg#140, closes davisking#7.
`throw()` has been deprecated since C++11 and is removed in C++20. `noexcept` is accepted alternative.
…isking#3063) If the extra field was larger than the space the user provided with inflateGetHeader(), and if multiple calls of inflate() delivered the extra header data, then there could be a buffer overflow of the provided space. This commit assures that provided space is not exceeded.
Solve build error when compiling with GCC 15.
…g#3056) * Add new BPE_Tokenizer class to Dlib - Implement BPE (Byte Pair Encoding) tokenization - Add training and encoding methods - Include unit tests * Update * Update * Last update: optimize BPE tokenizer encoding with parallel paragraph processing * Use of “in-memory” files to avoid leaving any traces on disk during the test. * Add one DLIB_TEST_MSG() test per encoded/decoded string * Add bpe_tokenizer_abstract.h for documentation and integration
Yeah sorry. I know I'm super late to review this. Looking at it now a bit. Just pushed a merge commit with master into the PR too. |
bool use_cuda(); | ||
void set_use_cuda(bool flag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add any new public functions to the docs (the _abstract.h files). These should be documented next to set_dnn_prefer_smallest_algorithms()
, so over in dlib/dnn/core_abstract.h
.
dlib/cuda/tensor_tools.cpp
Outdated
if (use_cuda()) | ||
{ | ||
cuda::dot_prods(add_to, out, lhs, rhs); | ||
} | ||
else | ||
out = sum_cols(pointwise_multiply(mat(lhs), mat(rhs))); | ||
{ | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this PR is a good idea. But can you change these so they are simpler? The preprocessor macro logic is now pretty complicated and makes the code hard to read.
But it could be done like this instead:
void dot_prods (
resizable_tensor& out,
const tensor& lhs,
const tensor& rhs
)
{
IF_DLIB_USE_CUDA(
cuda::dot_prods(out, lhs, rhs);
)
IF_DLIB_NOT_USE_CUDA(
out = sum_cols(pointwise_multiply(mat(lhs), mat(rhs)));
)
}
Which would be a pretty trivial mechanical change from what dlib has now and doesn't really change the readability of the code. And inside IF_DLIB_USE_CUDA
you just have an if (use_cuda())
wrapped inside a #ifdef DLIB_USE_CUDA
.
That is, define these macros:
#ifdef DLIB_USE_CUDA
#define IF_DLIB_USE_CUDA(expression) if (use_cuda()) { expression }
#else
#define IF_DLIB_USE_CUDA(expression)
#endif
#ifdef DLIB_USE_CUDA
#define IF_DLIB_NOT_USE_CUDA(expression) if (!use_cuda()) { expression }
#else
#define IF_DLIB_NOT_USE_CUDA(expression) expression
#endif
and then any code that looks like this on master:
#if DLIB_USE_CUDA
foo;
bar;
whatever;
#else
foo_cpu;
whatever2;
#endif
Just turns into:
IF_DLIB_USE_CUDA(
foo;
bar;
whatever;
)
IF_DLIB_NOT_USE_CUDA(
foo_cpu;
whatever2;
)
Yeah sorry for the late review. Just been real busy with other things. Every day I see this in my email and think about how late I've been but hasn't made it up my todo list until today. Haven't had a lot of free time lately 😢 Anyway, change it around to use the macros I suggested. Then the PR will be a trivial and obviously correct mechanical change to dlib's existing code and I'll merge it right in :D |
Thanks! I also had difficulty finding the time to fix this. I defined variadic macros because |
…king#3076) * Implementation of linear_ layer for neural networks. This layer provides an optimized linear transformation for multi-dimensional inputs. * Minor change * Update dlib/dnn/layers.h Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Add reshape_to and flatten layers to Dlib's DNN module * Missing update to "visitors.h" * format fixing for reshape_to * Update dlib/test/dnn.cpp --------- Co-authored-by: Davis E. King <davis685@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds runtime selection between CPU and CUDA implementations in the DNN module, replacing compile-time #ifdef DLIB_USE_CUDA
branches with a use_cuda()
flag.
- Introduce
use_cuda()
andset_use_cuda(bool)
APIs incore_abstract.h
. - Update tests in
dlib/test/dnn.cpp
to toggle CUDA on/off at runtime. - Refactor
dlib/cuda/tensor_tools.h
to hold both CPU and CUDA instances and dispatch viaIF_DLIB_USE_CUDA
/IF_DLIB_NOT_USE_CUDA
macros.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
dlib/test/dnn.cpp | Add set_use_cuda(true/false) calls around CUDA-CPU tests |
dlib/dnn/core_abstract.h | Declare use_cuda() and set_use_cuda(bool) with docs |
dlib/cuda/tensor_tools.h | Define dispatch macros, duplicate flag declarations, and split each op into CPU/CUDA branches |
Comments suppressed due to low confidence (2)
dlib/test/dnn.cpp:5029
- The new runtime flag behavior is only exercised in
test_multm_prev()
. Consider adding tests in other modules to validate thatuse_cuda()
correctly switches implementations for all runtime-dispatched operations.
set_use_cuda(true);
dlib/cuda/tensor_tools.h:36
use_cuda()
andset_use_cuda()
are already declared indlib/dnn/core_abstract.h
. Remove these duplicate declarations and include the core header instead to avoid ODR issues.
bool use_cuda();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I just realized this needs more work. I've been going over it for the last hour and a bit. I was updating the unit tests to work with and without a GPU. But I just realized that even when use_cuda()==false
it's still using cuda. You can see this by running this for example:
CUDA_VISIBLE_DEVICES= ./dnn_face_recognition_ex ../faces/bald_guys.jpg
with this change:
--- a/examples/dnn_face_recognition_ex.cpp
+++ b/examples/dnn_face_recognition_ex.cpp
@@ -80,6 +80,7 @@ std::vector<matrix<rgb_pixel>> jitter_image(
int main(int argc, char** argv) try
{
+ dlib::set_use_cuda(false);
if (argc != 2)
{
cout << "Run this example by invoking it like this: " << endl;
You will get this error. Since the resizable_tensor
objects are all still calling cudaMalloc
and allocating GPU memory. So the tensor objects would need to be updated too in a similar way to the other code so as to not actually do any allocations when cuda is disabled.
Error while calling cudaGetDevice(&the_device_id) in file /home/davis/source/dlib/dlib/cuda/gpu_data.cpp:204. code: 100, reason: no CUDA-capable device is detected
That CUDA_VISIBLE_DEVICES=
tells the cuda runtime to pretend there isn't any GPU attached to the machine. That's really what you want to work here. So that someone can run this stuff on a non-GPU machine even though it's compiled with cuda.
Anyway, I pushed the code I had. If you update the tensors to also use use_cuda()
then it should be good to go, unless I'm forgetting some other cuda tooling in there. But I think that's it. The key test is if prefixing programs with CUDA_VISIBLE_DEVICES=
results in them running without error even though they are built with cuda support :D
I updated the unit tests in a way that should work with CUDA_VISIBLE_DEVICES=
, but like I was saying, I went to test that after doing so and was reminded about the tensor allocation stuff :(
Oh, good catch. I didn't check the behavior with If we conditionally allocate the CUDA memory, we should also support the following scenario.
Here we must check if the device buffer needs to be allocated when the data currency is checked. |
Also, do you still want to add the |
Yep, that would make it all work smoothly unless I'm forgetting another part of it as well. But I think that's the only bit remaining. |
Na, it's good how it is. |
This PR adds support for selecting either the CPU or CUDA implementation of the dnn module at runtime. As discussed in #1852, the
#ifdef DLIB_USE_CUDA
directives intensor_tools.cpp
just need to be replaced with a runtime branch likeThe thread local variable returned by
use_cuda()
is initialized to be consistent with the current behavior. IfDLIB_USE_CUDA=True
the function,use_cuda()
will returntrue
by default.I found while making these changes that is was necessary to keep the preprocessor directives because the
cuda
namespace cannot be referenced when CUDA is not linked. Additionally, the pooling and convolution classes intensor_tools.h
have instances of both implementations as members. I couldn't find a cleaner way (with limited time) to handle those cases.