Skip to content

SIGILL during checkpoint #2641

@specht478

Description

@specht478

Dear CRIU team,

When using CRIU 3.19 on debian bookworm to checkpoint a container, approximately 5% of the attempts fail with the following error messages in the log:

(00.246054) Parasite syscall_ip at 0x40002c0be000
(00.278562) Set up parasite blob using memfd
(00.278627) Putting parasite blob into 0xffff81c9f000->0x400137d16000
(00.278760) Dumping GP/FPU registers for 15222
(00.278795) Putting tsock into pid 15222
(00.279009) Wait for parasite being daemonized...
(00.279028) Wait for ack 2 on daemon socket
(00.279173) Error (criu/parasite-syscall.c:88): si_code=4 si_pid=15222 si_status=4
(00.279198) Error (criu/parasite-syscall.c:95): 15222 was stopped by 4 unexpectedly

According to the comments in parasite-syscall.c and the log output, this indicates that a SIGILL occurred while executing the parasite code.

The process running in the container has an installed custom signal handler for nearly all signals, including SIGILL. After CRIU sets the parasite blob into the process, will the old signal handler be still be trigerred? I am asking because if that is the case, I could set a breakpoint there with a hardware debugger (I am working with an ARM SoC) to have a look to the stack trace and determine what caused that SIGILL.

Do you have any other suggestions or insights on how can I proceed to debug and resolve this problem?

Thanks in advance and regards
gspecht478

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions