Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

delete sandbox failed and resource left when pod_container type container fail to start #2882

Closed
flyflypeng opened this issue Aug 8, 2020 · 0 comments · Fixed by #2883
Closed
Labels
bug Incorrect behaviour needs-review Needs to be assessed by the team.

Comments

@flyflypeng
Copy link
Contributor

Description of problem

delete sandbox failed and resource left when pod_container type container fail to start.

I mock the error in the following virtcontainers/sandbox.go to simulate error happens after c.create():

func (s *Sandbox) CreateContainer(contConfig ContainerConfig) (VCContainer, error) {
	// Create the container.
	......

	err = c.create()
	if err != nil {
		return nil, err
	}

       .....

	defer func() {
		// Rollback if error happens.
		if err != nil {
			logger := s.Logger().WithFields(logrus.Fields{"container-id": c.id, "sandox-id": s.id, "rollback": true})

			logger.Warning("Cleaning up partially created container")

			if err2 := c.stop(true); err2 != nil {
				logger.WithError(err2).Warning("Could not delete container")
			}

			logger.Debug("Removing stopped container from sandbox store")

			s.removeContainer(c.id)
		}
	}()

       // >>>>>>>>>>>Mock Error Here>>>>>>>>>>>>
	err = fmt.Errorf("just return the error")
	return nil, err

	// Sandbox is responsible to update VM resources needed by Containers
	// Update resources after having added containers to the sandbox, since
	// container status is requiered to know if more resources should be added.
	err = s.updateResources()
	if err != nil {
		return nil, err
	}

	......
}

And then I do the following test:

# step 1: start pod_sandbox type container
# docker run -tid --runtime kata-runtime --network none --annotation io.kubernetes.docker.type=podsandbox pause
e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb

# step 2:  start pod_container type in the Pod
# docker run -tid --runtime kata-runtime --network none --annotation io.kubernetes.docker.type=container --annotation io.kubernetes.sandbox.id=e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb ubuntu bash
5d978668c7b1bc88fcbe4d017b86ca1b9c42be03166c8b68763b56bb530e089f
docker-origin: Error response from daemon: OCI runtime create failed: just return the error: unknown.

# step 3: remove forcefully the whole pod by docker, however return the error
# docker rm -f `docker ps -aq`
5d978668c7b1
9ba75fba1faa
Error response from daemon: Could not kill running container e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb, cannot remove - Cannot kill container e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb: unknown error after kill: /var/lib/docker/runtimes/kata-runtime did not terminate sucessfully: open /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/5d978668c7b1bc88fcbe4d017b86ca1b9c42be03166c8b68763b56bb530e089f/config.json: no such file or directory

Step 3 docker rm -f command returns the error and cause the store persist sandbox dir /var/run/vc/sbs/<sandbox-id> and qemu-kvm process and kata-process left in the machine.

Process left:

# ps -ef | grep e2869daa221f302ca2
root     12488     1  0 16:58 ?        00:00:01 /usr/bin/qemu-kvm -name sandbox-e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb -uuid 7283ce20-9434-4dbc-ae9e-ca033f4874a9 -machine pc,accel=kvm,kernel_irqchip -cpu host, -qmp unix:/run/vc/vm/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/qmp.sock,server,nowait -m 1024M,slots=10,maxmem=387231M -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/console.sock,server,nowait -device virtio-scsi-pci,id=scsi0,disable-modern=false,romfile= -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0,romfile= -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 -chardev socket,id=charch0,path=/run/vc/vm/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/kata.sock,server,nowait -device virtio-9p-pci,disable-modern=false,fsdev=extra-9p-kataShared,mount_tag=kataShared,romfile= -fsdev local,id=extra-9p-kataShared,path=/run/kata-containers/shared/sandboxes/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/shared,security_model=none -rtc base=utc,driftfix=slew,clock=host -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic --no-reboot -daemonize -object memory-backend-ram,id=dimm1,size=1024M -numa node,memdev=dimm1 -kernel /var/lib/kata/kernel -initrd /var/lib/kata/kata-containers-initrd-github.img -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 cryptomgr.notests net.ifnames=0 pci=lastbus=0 iommu=off debug panic=1 nr_cpus=40 agent.use_vsock=false scsi_mod.scan=none agent.log=debug agent.log=debug agent.netlink_recv_buf_size=2MB -pidfile /run/vc/vm/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/pid -D /run/vc/vm/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/qemu.log -smp 1,cores=1,threads=1,sockets=40,maxcpus=40
root     12493     1  0 16:58 ?        00:00:00 /usr/bin/kata-proxy-github -listen-socket unix:///run/vc/sbs/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/proxy.sock -mux-socket /run/vc/vm/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/kata.sock -sandbox e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb -log debug -agent-logs-socket /run/vc/vm/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/console.sock

store persist dir left:

/var/run/vc/sbs/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb # ls
5d978668c7b1bc88fcbe4d017b86ca1b9c42be03166c8b68763b56bb530e089f  e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb  persist.json  proxy.sock

mount point left:

# mount | grep e2869daa221f302ca2
tmpfs on /run/kata-containers/shared/sandboxes/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/shared type tmpfs (ro,mode=755)
/dev/mapper/cpsVG-rootfs on /run/kata-containers/shared/sandboxes/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/mounts/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb-bb88f5c665b8b013-resolv.conf type ext4 (rw,relatime,stripe=64,data=ordered)
/dev/mapper/cpsVG-rootfs on /run/kata-containers/shared/sandboxes/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/shared/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb-bb88f5c665b8b013-resolv.conf type ext4 (rw,relatime,stripe=64,data=ordered)
/dev/mapper/cpsVG-rootfs on /run/kata-containers/shared/sandboxes/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/mounts/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb-b42f0cbdbe86b606-hostname type ext4 (rw,relatime,stripe=64,data=ordered)
/dev/mapper/cpsVG-rootfs on /run/kata-containers/shared/sandboxes/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/shared/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb-b42f0cbdbe86b606-hostname type ext4 (rw,relatime,stripe=64,data=ordered)
/dev/mapper/cpsVG-rootfs on /run/kata-containers/shared/sandboxes/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/mounts/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb-913ee122772b6898-hosts type ext4 (rw,relatime,stripe=64,data=ordered)
/dev/mapper/cpsVG-rootfs on /run/kata-containers/shared/sandboxes/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb/shared/e2869daa221f302ca23f2dae8a6a0b473386d7eae40f46a123f7f9b21bbabfbb-913ee122772b6898-hosts type ext4 (rw,relatime,stripe=64,data=ordered)

Expected result

remove the whole sandbox successfully without resource left in the machine after pod_container type container fail to start

Actual result

as the above problem description

@flyflypeng flyflypeng added bug Incorrect behaviour needs-review Needs to be assessed by the team. labels Aug 8, 2020
flyflypeng added a commit to flyflypeng/runtime that referenced this issue Aug 8, 2020
fixes: kata-containers#2882

reason: If error happens after container create and before sandbox
updateResouce in the `CreateContainer()`, then delete sandbox
forcefully will return error because s.config.Containers config not
flushed into persist store.

Signed-off-by: jiangpengfei <[email protected]>
flyflypeng added a commit to flyflypeng/runtime that referenced this issue Aug 11, 2020
fixes: kata-containers#2882

reason: If error happens after container create and before sandbox
updateResouce in the `CreateContainer()`, then delete sandbox
forcefully will return error because s.config.Containers config not
flushed into persist store.

Signed-off-by: jiangpengfei <[email protected]>
flyflypeng added a commit to flyflypeng/runtime that referenced this issue Aug 11, 2020
fixes: kata-containers#2882

reason: If error happens after container create and before sandbox
updateResouce in the `CreateContainer()`, then delete sandbox
forcefully will return error because s.config.Containers config not
flushed into persist store.

Signed-off-by: jiangpengfei <[email protected]>
jcvenegas pushed a commit to jcvenegas/runtime that referenced this issue Oct 19, 2020
fixes: kata-containers#2882

reason: If error happens after container create and before sandbox
updateResouce in the `CreateContainer()`, then delete sandbox
forcefully will return error because s.config.Containers config not
flushed into persist store.

Signed-off-by: jiangpengfei <[email protected]>
jcvenegas pushed a commit to jcvenegas/runtime that referenced this issue Oct 19, 2020
fixes: kata-containers#2882

reason: If error happens after container create and before sandbox
updateResouce in the `CreateContainer()`, then delete sandbox
forcefully will return error because s.config.Containers config not
flushed into persist store.

Signed-off-by: jiangpengfei <[email protected]>
jcvenegas pushed a commit to jcvenegas/runtime that referenced this issue Oct 20, 2020
fixes: kata-containers#2882

reason: If error happens after container create and before sandbox
updateResouce in the `CreateContainer()`, then delete sandbox
forcefully will return error because s.config.Containers config not
flushed into persist store.

Signed-off-by: jiangpengfei <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Incorrect behaviour needs-review Needs to be assessed by the team.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant