Skip to content

Commit c5a3e9d

Browse files
committed
Revert "plugins/amdgpu: Implement parallel restore"
This functionality (#2527) is being reverted and excluded from this release due to issue #2812. It will be included in a subsequent release once all associated issues are resolved.
1 parent cb8e1da commit c5a3e9d

File tree

8 files changed

+51
-770
lines changed

8 files changed

+51
-770
lines changed

Documentation/criu-amdgpu-plugin.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@ Checkpoint / Restore inside a docker container
1515
Pytorch
1616
Tensorflow
1717
Using CRIU Image Streamer
18-
Parallel Restore
1918

2019
DESCRIPTION
2120
-----------

plugins/amdgpu/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ endif
2727
criu-amdgpu.pb-c.c: criu-amdgpu.proto
2828
protoc --proto_path=. --c_out=. criu-amdgpu.proto
2929

30-
amdgpu_plugin.so: amdgpu_plugin.c amdgpu_plugin_drm.c amdgpu_plugin_topology.c amdgpu_plugin_util.c criu-amdgpu.pb-c.c amdgpu_socket_utils.c
30+
amdgpu_plugin.so: amdgpu_plugin.c amdgpu_plugin_drm.c amdgpu_plugin_topology.c amdgpu_plugin_util.c criu-amdgpu.pb-c.c
3131
$(CC) $(PLUGIN_CFLAGS) $(shell $(COMPEL) includes) $^ -o $@ $(PLUGIN_INCLUDE) $(PLUGIN_LDFLAGS) $(LIBDRM_INC)
3232

3333
amdgpu_plugin_clean:

plugins/amdgpu/README.md

Lines changed: 1 addition & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,7 @@ Supporting ROCm with CRIU
33

44
_Felix Kuehling <Felix.Kuehling@amd.com>_<br>
55
_Rajneesh Bardwaj <Rajneesh.Bhardwaj@amd.com>_<br>
6-
_David Yat Sin <David.YatSin@amd.com>_<br>
7-
_Yanning Yang <yangyanning@sjtu.edu.cn>_
6+
_David Yat Sin <David.YatSin@amd.com>_
87

98
# Introduction
109

@@ -225,26 +224,6 @@ to resume execution on the GPUs.
225224
*This new plugin is enabled by the new hook `__RESUME_DEVICES_LATE` in our RFC
226225
patch series.*
227226

228-
## Restoring BO content in parallel
229-
230-
Restoring the BO content is an important part in the restore of GPU state and
231-
usually takes a significant amount of time. A possible location for this
232-
procedure is the `cr_plugin_restore_file` hook. However, restoring in this hook
233-
blocks the target process from performing other restore operations, which
234-
hinders further optimization of the restore process.
235-
236-
Therefore, a new plugin hook that runs in the master restore process is
237-
introduced, and it interacts with the `cr_plugin_restore_file` hook to complete
238-
the restore of BO content. Specifically, the target process only needs to send
239-
the relevant BOs to the master restore process, while this new hook handles all
240-
the restore of buffer objects. Through this method, during the restore of the BO
241-
content, the target process can perform other restore operations, thus
242-
accelerating the restore procedure. This is an implementation of the gCROP
243-
method proposed in the ACM SoCC'24 paper: [On-demand and Parallel
244-
Checkpoint/Restore for GPU Applications](https://dl.acm.org/doi/10.1145/3698038.3698510).
245-
246-
*This optimization technique is enabled by the `__POST_FORKING` hook.*
247-
248227
## Other CRIU changes
249228

250229
In addition to the new plugins, we need to make some changes to CRIU itself to

0 commit comments

Comments
 (0)