-
Notifications
You must be signed in to change notification settings - Fork 907
Complete rework of the UCT BTL #13335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
It is possible that the current memory domain does not have an adequate transport for forming endpoint to endpoint connections. When this is the case the btl will fail to function. To support these situations this CL adds support for using an alternate transport (usually tcp) which can be used to make the endpoint connections. Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Hello! The Git Commit Checker CI bot found a few problems with this PR: c089b67: btl/uct: move device context code to a new file
b3c3646: btl/uct: complete rework of descovery and initiali...
114c30a: btl/uct: move uct_component from the module to mca...
9abd9a1: btl/uct: keep UCT component list around after init...
3a47090: btl/uct: put active tls in a list on the mca_btl_u...
ab6f5be: btl/uct: module md_name to mca_btl_uct_md_t
dcdf135: btl/uct: remove the need for a BTL module for conn...
53d1b4d: btl/uct: move the async context from the module to...
5a4f8a0: btl/uct: downgrade endpoing lock from recursive
7b87df1: btl/uct: fix some errors around connection endpoin...
5fb0d62: btl/uct: change mca_btl_uct_tl_t uct_dev_contexts ...
a0f60ae: btl/uct: move tl attributes off of tl context stru...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
In theory the tl attributes do not differ betweeen contexts so query them once when the tl is created not once per context. This removes the need to allocate the first context so that code has also been removed. Signed-off-by: Nathan Hjelm <hjelmn@google.com>
The btl always allocates the maximum number of contexts. This is not a significant amount of memory. Rather than reduce it to be based on the configured maximum number of contexts it makes sense to just make it an array and remove the extra indirection when accessing the contexts. Signed-off-by: Nathan Hjelm <hjelmn@google.com>
This commit fixes a couple of bugs discovered using rma-mt: - Do not call mca_btl_uct_endpoint_set_flag before sending a message on the connection endpoint. This method may cause the release of the connection endpoint (cached on the BTL endpoint). If this happens it would lead to a SEGV. - Flush the endpoint only when it is being released. There is no need to do so on every send. Releasing the endpoint without flushing it may lead to it being destroyed while still processing data. Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Recursive locks are not needed for the endpoint lock. This commit modifies the lock to be opal_mutex_t. Signed-off-by: Nathan Hjelm <hjelmn@google.com>
There is no real benefit from sharing the async context between tls. Given this and some other changes that will be made it makes sense to move it from the module to the tl. Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Connection TLs are only used to form connections for connect-to-endpoint TLs. They do not need to belong to the same memory domain as the one they are used with so there is no need to rely on a BTL module. This commit moves the pending_connection_reqs to the tl and changes the code to support a NULL module for the connection tl. Signed-off-by: Nathan Hjelm <hjelmn@google.com>
More cleanup preparing to separate connection-only tls from their module. Signed-off-by: Nathan Hjelm <hjelmn@google.com>
This simplifies the code a bit by moving mca_btl_uct_tl_t ownership to the mca_btl_uct_md_t class. Signed-off-by: Nathan Hjelm <hjelmn@google.com>
From what I can tell it is not ok to be releasing this list while the components within the list are still in use. To be safe keep the list around until the uct component is closed. Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
There is an issue with btl/uct which prevents the usage of the standard btl_uct_ MCA variables (eager limit, flags, etc). Because of the way the btl was written these values are all determined directly from UCT and can not be changed using the MCA variable interface. To address this issue this commit breaks apart the initialization code and separates out the pieces that are necessary for discovery only. The discovery pieces now use a new set of variables that include the memory domain name and directly control the behavior for BTLs on that memory domain as well as enabling the usage of the btl_uct variable to control the defaults for these variables. Example, using memory domain irdma0 will create variables: btl_uct_irdma0_eager_limit, btl_uct_irdma0_max_send_size, etc. The defaults will be based on what is reported by UCT and the user can set the values to a subset of what UCT reports. For example, if the max send size for the hardware is 8192B then it can be set to anything up to and including that value. The same is true for feature flags, if the hardware supports only some btl atomics or operations the user can specify a subset of them (others will be ignored). Signed-off-by: Nathan Hjelm <hjelmn@google.com>
There is a specific header for device contexts so it makes sense to move the context-specific code to a matching C file. No changes in this other than moving code around. Signed-off-by: Nathan Hjelm <hjelmn@google.com>
c089b67
to
349db3b
Compare
I know this will be a bit difficult to review. Looking at how I might be able to make it more reviewable. The first and second commit might be able to go in on their own but I improved the "used a different tl" logic in another commit. Was kind of hard working on this with the VMs being deleted once every 5 days and doing more defensive commits than having a good breakdown. I can probably split off creating the new logic vs removing the old. Will see. What I really need is some verification of this on Mellanox hardware. This code greatly improves what we need for our RoCE but I don't want to break IB and I don't have anything to test with. |
This PR contains a number of changes related to the UCT btl, these include: