-
Notifications
You must be signed in to change notification settings - Fork 15
Use CUDA events instead of CUDA device/stream synchronization #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 14 commits
2c0dac4
1202c50
bc40a83
6954a34
c3b04e3
ecb9e2d
fa236cb
e393aa0
c23e13f
33015af
2699326
23673ab
1180b8e
aa2ade6
7714a6f
eded688
8887137
0bac2a6
5f04b7a
01b2717
95cb163
42d2b9f
f019d8d
0226f52
a56070c
aaad38c
02b6c01
1f41e92
4ae9a4e
63f819e
71a8786
3055053
725c3aa
21f04a1
11bac26
dfd6e70
6795eff
dc1d898
b448c7e
007a536
8f6ecaf
10fc871
e2d8c9c
27f856c
640c2bf
40ebb2a
3f65ea3
17c01e2
8627cfc
afd84a1
cb0b159
1f2f966
de3db36
294d5b9
6a1ae03
16c8b17
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,8 @@ | |
#include <variant> | ||
#include <vector> | ||
|
||
#include <cuda_runtime.h> | ||
|
||
#include <rmm/device_buffer.hpp> | ||
|
||
#include <rapidsmpf/error.hpp> | ||
|
@@ -122,12 +124,27 @@ class Buffer { | |
); | ||
} | ||
|
||
/** | ||
* @brief Check if the last copy operation has completed. | ||
* | ||
* @return true if the copy operation has completed or no copy operation | ||
* was performed, false if it is still in progress. | ||
*/ | ||
[[nodiscard]] bool is_copy_complete() const; | ||
|
||
/// @brief Buffer has a move ctor but no copy or assign operator. | ||
Buffer(Buffer&&) = default; | ||
Buffer(Buffer const&) = delete; | ||
Buffer& operator=(Buffer& o) = delete; | ||
Buffer& operator=(Buffer&& o) = delete; | ||
|
||
/** | ||
* @brief Destructor for Buffer. | ||
* | ||
* Cleans up any allocated resources. | ||
*/ | ||
~Buffer(); | ||
|
||
private: | ||
/** | ||
* @brief Construct a Buffer from host memory. | ||
|
@@ -208,6 +225,8 @@ class Buffer { | |
/// @brief The underlying storage host memory or device memory buffer (where | ||
/// applicable). | ||
StorageT storage_; | ||
/// @brief CUDA event used to track copy operations | ||
cudaEvent_t cuda_event_; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure I really like this design. This produces one event per buffer that we create whereas really we only need one event per stream that we see. Very minimally, we should definitely create the events with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That would only shift what we need to track, wouldn't it? If we had one event per stream, how would we know if a buffer was created before or after the event? From the check's perspective it could have happened either before or after and we would go back to have potentially invalid memory accesses.
Thanks for the suggestion, I'll do that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
no? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, and this is why we have one event per buffer. If we had one event per stream we could end up with:
That is what I don't think will work, or are you suggesting something different and I misunderstood the suggestion? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is now done in 1180b8e . There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But in that case aren't we just blocking until all buffers are completed? This would potentially prevent us from progressing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An additional complication is that you would also need to keep track of the streams and events globally, meaning we probably wouldn't be able to get an arbitrary |
||
}; | ||
|
||
} // namespace rapidsmpf |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -116,6 +116,7 @@ std::unique_ptr<Communicator::Future> MPI::send( | |
std::unique_ptr<Communicator::Future> MPI::send( | ||
std::unique_ptr<Buffer> msg, Rank rank, Tag tag | ||
) { | ||
RAPIDSMPF_EXPECTS(msg->is_copy_complete(), "buffer copy has not completed yet"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately, you're right. It is unfortunate that you're right because this means there's no good way to push this check onto the communicator and we thus have to force the caller to ensure that. I'll revert those changes and update the docstrings to make the contract to the caller clear: the caller needs to ensure the allocation and data are ready. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done in eded688 . |
||
RAPIDSMPF_EXPECTS( | ||
msg->size <= std::numeric_limits<int>::max(), | ||
"send buffer size exceeds MPI max count" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "copy" is, I think, the wrong phrasing. I think you mean "has any stream-ordered work to allocate this buffer completed".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this only applies to
Buffer::copy
at the moment, making this more general would IMO imply that it is also relevant for the constructor that just allocates without copies.