-
Notifications
You must be signed in to change notification settings - Fork 678
Dmabuf IPC support for amdgpu plugin (v2) #2613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: wip/amdgpu
Are you sure you want to change the base?
Conversation
|
A friendly reminder that this PR had no activity for 30 days. |
|
@fdavid-amd how is it going? Let me know if you need help with this pr. |
b057779 to
456978a
Compare
amdgpu represents allocated device memory as a memory mapping of the device file. This is a non-standard VMA that must be handled by the plugin, not the normal VMA code. Ignore all VMAs on device files. Signed-off-by: David Francis <David.Francis@amd.com>
amdgpu dmabuf CRIU requires the ability of the amdgpu plugin to retry. Change files_ext.c to read a response of 1 from a plugin restore function to mean retry. Signed-off-by: David Francis <David.Francis@amd.com>
45f796c to
878a313
Compare
|
@avagin |
|
Some of the build failures I don't understand - they might be an issue with this patch set or they might be an issue with the CI. There's a lot of 'Error: ipv6: address already assigned.', which could possibly be a result of the way that I'm allocating the plugin socket but seems unlikely. |
|
@avagin sorry for the repeat ping, but there is something of a rush on my part to get kernel patch https://lists.freedesktop.org/archives/dri-devel/2025-July/514922.html merged in the next week or so (it adds the CHANGE_HANDLE ioctl). As part of merging that patch, I need to demonstrate that it has written and reasonably stable userspace code that uses it (which would be these CRIU patches). It would be a big help to me if you could say if the general approach taken by these patches is acceptable or unacceptable. |
|
@fdavid-amd sorry for the delay. Honestly, I am not a fan of the current approach. I don't understand why the same logic cannot be implemented in the plugin code. It could be something like this: https://gist.github.com/avagin/4484274ec19ac10a8966b5b2800533c0. |
3d40e7a to
6cfa960
Compare
|
I see. Updated with that approach |
|
LGTM |
|
What are the next steps to merging this? |
|
@fdavid-amd I've rebased the patches in this PR on the criu-dev branch to run the CI tests. |
|
@fdavid-amd I used clang-format to reformat C files and fixed one issue that was a root cause of CI failures. I pushed all changes in https://github.com/checkpoint-restore/criu/tree/wip/amdgpu. Let know if I messed up something. |
|
Oh wait |
258f378 to
01792f8
Compare
|
Now fixed |
|
@fdavid-amd Would it be possible to provide more information on how we can test these patches? |
|
|
||
| CR_PLUGIN_HOOK__POST_FORKING = 12, | ||
|
|
||
| CR_PLUGIN_HOOK__RESTORE_INIT = 13, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to update plugins/amdgpu/README.md with information about the new plugin callbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That README doesn't seem to contain information about plugin callbacks; I think the appropriate place for such documentation might be the callback functions themselves or criu-plugin.h
In any case I hope it is clear what RESTORE_INIT does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this documentation is to help new users to understand how CRIU and the AMD GPU Plugin work. Most users don't read the comments in header files like criu-plugin.h.
amdgpu libraries that use dmabuf fd to share GPU memory between processes close the dmabuf fds immediately after using them. However, it is possible that checkpoint of a process catches one of the dmabuf fds open. In that case, the amdgpu plugin needs to handle it. The checkpoint of the dmabuf fd does require the device file it was exported from to have already been dumped To identify which device this dmabuf fd was exprted from, attempt to import it on each device, then record the dmabuf handle it imports as. This handle can be used to restore it. Signed-off-by: David Francis <David.Francis@amd.com>
01792f8 to
2097325
Compare
@fdavid-amd pls do all changes on top of the wip/amdgpu branch. I spent my time to fix ci tests. |
|
@fdavid-amd I've applied the patches here with @avagin's changes from the |
Add new ioctl DRM_IOCTL_AMDGPU_GEM_LIST_HANDLES. This ioctl returns a list of bos with their handles, sizes, and flags and domains. This ioctl is meant to be used during CRIU checkpoint and provide information needed to reconstruct the bos in CRIU restore. Userspace for this and the next change can be found at checkpoint-restore/criu#2613 Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
Add new GEM_OP_IOCTL option GET_MAPPING_INFO, which returns a list of mappings associated with a given bo, along with their positions and offsets. Userspace for this and the previous change can be found at checkpoint-restore/criu#2613 Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
Add new ioctl DRM_IOCTL_AMDGPU_GEM_LIST_HANDLES. This ioctl returns a list of bos with their handles, sizes, and flags and domains. This ioctl is meant to be used during CRIU checkpoint and provide information needed to reconstruct the bos in CRIU restore. Userspace for this and the next change can be found at checkpoint-restore/criu#2613 Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add new GEM_OP_IOCTL option GET_MAPPING_INFO, which returns a list of mappings associated with a given bo, along with their positions and offsets. Userspace for this and the previous change can be found at: checkpoint-restore/criu#2613 Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch set, in combination with the accompanying kernel patch set, is the
second draft of work to support amdgpu dmabuf IPC. Since the first post, the
patches have been split to make them more readable and made
backwards-compatible.