-
Notifications
You must be signed in to change notification settings - Fork 12
Description
There are different ways to issue a checkpoint and users can assume that it's written down completely on different occasions. A natural way would be waiting for the Java process to finish, but at this moment it is not guaranteed that everything is written by that time: the cppath
file is written by the (parent) criuengine process, allegedly criu
writes some files after sending the termination signal etc.
As an independent issue, when the JDK.checkpoint
command confirms criu
is not even invoked yet.
This problem is related but not limited to containers, where container termination would wait for all (orphaned) processes to finish as openjdk/crac#46 ensures. Even in baremetal setups it is important to be able to find the moment when it's safe to pack the image and send it over network elsewhere.
One way this could be solved is starting an init process in place of the actual java, set an environment variable with its PID that would be inherited by all subprocesses and run the actual java application as subprocess. Every new process that is about to pariticipate on the image would send its PID (e.g. using a signal) to the init process and init itself would not terminate until all these offsprings terminate. Alternatively we could just wait for the whole subtree to complete (I am just not sure if there's a reliable way to detect that a process was in the subtree at some point).
A separate issue is the jcmd ... JDK.checkpoint
. If the solution above is implemented the jcmd connection could be handed over to the init process above and only this would sent the acknowledgement.