Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Commit

Permalink
virtcontainers: Properly remove the container when shim gets killed
Browse files Browse the repository at this point in the history
Here is an interesting case I have been debugging. I was trying to
understand why a "kubeadm reset" was not working for kata-runtime
compared to runc. In this case, the only pod started with Kata is
the kube-dns pod. For some reasons, when this pod is stopped and
removed, its containers receive some signals, 2 of them being SIGTERM
signals, which seems the way to properly stop them, but the third
container receives a SIGCONT. Obviously, nothing happens in this
case, but apparently CRI-O considers this should be the end of the
container and after a few seconds, it kills the container process
(being the shim in Kata case). Because it is using a SIGKILL, the
signal does not get forwarded to the agent because the shim itself
is killed right away. After this happened, CRI-O calls into
"kata-runtime state", we detect the shim is not running anymore
and we try to stop the container. The code will eventually call
into agent.RemoveContainer(), but this will fail and return an
error because inside the agent, the container is still running.

The approach to solve this issue here is to send a SIGKILL signal
to the container after the shim has been waited for. This call does
not check for the error returned because most of the cases, regular
use cases, will end up returning an error because the shim itself
not being there actually represents the container inside the VM has
already terminated.
And in case the shim has been killed without the possibility to
forward the signal (like described in first paragraph), the SIGKILL
will work and will allow the following call to agent.stopContainer()
to proceed to the removal of the container inside the agent.

Fixes #274

Signed-off-by: Sebastien Boeuf <[email protected]>
  • Loading branch information
Sebastien Boeuf committed Apr 28, 2018
1 parent d4225ed commit 789dbca
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions virtcontainers/container.go
Original file line number Diff line number Diff line change
Expand Up @@ -665,6 +665,16 @@ func (c *Container) stop() error {
}
}

// Force the container to be killed. For most of the cases, this
// should not matter and it should return an error that will be
// ignored.
// But for the specific case where the shim has been SIGKILL'ed,
// the container is still running inside the VM. And this is why
// this signal will ensure the container will get killed to match
// the state of the shim. This will allow the following call to
// stopContainer() to succeed in such particular case.
c.sandbox.agent.killContainer(*(c.sandbox), *c, syscall.SIGKILL, true)

if err := c.sandbox.agent.stopContainer(*(c.sandbox), *c); err != nil {
return err
}
Expand Down

0 comments on commit 789dbca

Please sign in to comment.