-
Notifications
You must be signed in to change notification settings - Fork 652
Description
Dear CRIU team,
When using CRIU 3.19 on debian bookworm to checkpoint a container, approximately 5% of the attempts fail with the following error messages in the log:
(00.246054) Parasite syscall_ip at 0x40002c0be000
(00.278562) Set up parasite blob using memfd
(00.278627) Putting parasite blob into 0xffff81c9f000->0x400137d16000
(00.278760) Dumping GP/FPU registers for 15222
(00.278795) Putting tsock into pid 15222
(00.279009) Wait for parasite being daemonized...
(00.279028) Wait for ack 2 on daemon socket
(00.279173) Error (criu/parasite-syscall.c:88): si_code=4 si_pid=15222 si_status=4
(00.279198) Error (criu/parasite-syscall.c:95): 15222 was stopped by 4 unexpectedly
According to the comments in parasite-syscall.c and the log output, this indicates that a SIGILL occurred while executing the parasite code.
The process running in the container has an installed custom signal handler for nearly all signals, including SIGILL
. After CRIU sets the parasite blob into the process, will the old signal handler be still be trigerred? I am asking because if that is the case, I could set a breakpoint there with a hardware debugger (I am working with an ARM SoC) to have a look to the stack trace and determine what caused that SIGILL
.
Do you have any other suggestions or insights on how can I proceed to debug and resolve this problem?
Thanks in advance and regards
gspecht478