Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

clh: Potential regression test failure VmInfoGet failed #2864

Closed
likebreath opened this issue Jul 28, 2020 · 13 comments
Closed

clh: Potential regression test failure VmInfoGet failed #2864

likebreath opened this issue Jul 28, 2020 · 13 comments
Labels
bug Incorrect behaviour needs-review Needs to be assessed by the team.

Comments

@likebreath
Copy link
Contributor

Description of problem

As observed from the two PRs (#2840 and #2833), we see the clh-docker CI job is failing on run hot plug block devices. Also, @egernst reported a similar failure of VmInfoGet failed after hotplug memory to kata+clh.

As @jcvenegas and I confirmed the failure is not related to the changes from the PRs, we believe this is a regression failure introduced recently. The last CI job was passing few days ago on 07/25 [here](http://jenkins.katacontainers.io/job/kata-containers-runtime-ubuntu-1804-PR-cloud-hypeprvisor-docker/141/.

I am opening a dummy PR to verify whether it is actually a regression test failure that was escaped from previous checks/CIs.

@likebreath likebreath added bug Incorrect behaviour needs-review Needs to be assessed by the team. labels Jul 28, 2020
@likebreath
Copy link
Contributor Author

/cc @jcvenegas @amshinde @sboeuf @egernst

likebreath added a commit to likebreath/kata-runtime that referenced this issue Jul 29, 2020
It is a potential regression test failure observed as `VmInfoGet failed`
after block device hotplug and memory hotplug.

Fixes: kata-containers#2864

Signed-off-by: Bo Chen <[email protected]>
@likebreath
Copy link
Contributor Author

Interesting/unexpected experiments results from running clh-docker CI jobs on different PRs:
PR #2865 (dummy PR for debugging this issue) is passing: http://jenkins.katacontainers.io/job/kata-containers-runtime-ubuntu-1804-PR-cloud-hypeprvisor-docker/150/
PR #2833 (support block device unplug) now is passing (while it was failing earlier today): http://jenkins.katacontainers.io/job/kata-containers-runtime-ubuntu-1804-PR-cloud-hypeprvisor-docker/153/
PR #2840 (Update qemu-virtiofs to 5.0) is still failing at the same docker test: http://jenkins.katacontainers.io/job/kata-containers-runtime-ubuntu-1804-PR-cloud-hypeprvisor-docker/152/

@sboeuf
Copy link

sboeuf commented Jul 29, 2020

@likebreath thanks for keeping track of these issues!
As always, I would suggest that we try to reproduce the problem by manually running a Cloud-Hypervisor instance. Unless it's not reproducible because of some weird race condition, this would simplify the debug of these issues.

@jcvenegas
Copy link
Member

jcvenegas commented Jul 29, 2020

I still not have a cloud-hypervisor only case, but I now have created a script with to docker to isolate a bit more.

#!/bin/bash
set -x
set -e

loops=9
loop_list=()

create_loop_devices() {
	for i in $(seq 1 ${loops}); do
		loop_name="loop${i}"
		dd if=/dev/zero of=/tmp/${loop_name} count=1 bs=50M
		bash -c printf "g\nn\n\n\n\nw\n" | sudo fdisk /tmp/${loop_name}
		loop_path=$(sudo losetup -fP --show "/tmp/${loop_name}")
		loop_list+=($loop_path)

		sudo losetup -j "/tmp/${loop_name}"
	done
}

delete_loop_devices() {
	for p in "${loop_list[@]}"; do
		sudo losetup -d "${p}"
	done
}

create_loop_devices
trap delete_loop_devices EXIT

docker_cmd="docker"
docker_cmd+=" run"
docker_cmd+=" --runtime kata-runtime"
docker_cmd+=" --rm"
for p in "${loop_list[@]}"; do
	docker_cmd+=" --device ${p}"
done
docker_cmd+=" busybox find /dev -name 'loop*'"

while
	eval "${docker_cmd}"
do
	echo ok
done
exit

@jcvenegas
Copy link
Member

Usually using 9 loop devices is easier to reproduce.

@egernst
Copy link
Member

egernst commented Jul 29, 2020

It sounds like this is a regression in functionality? Can we attempt to bisect? Is this observable on stable?

@likebreath
Copy link
Contributor Author

@egernst We are working on a reproducer of the problem for the master upstream. Would you please share how you encountered the failure on your setup (w/ memory hotplug)?

@egernst
Copy link
Member

egernst commented Jul 29, 2020

wget https://github.com/kata-containers/runtime/releases/download/1.12.0-alpha0/kata-static-1.12.0-alpha0-x86_64.tar.xz

On a Kube system w/ Kata runtime classes already installed, and kata-deploy already run, update binaries for latest master release:

sudo tar -xvf kata-static-1.12.0-alpha0-x86_64.tar.xz -C /

Then, just run a pod:

---
apiVersion: batch/v1
kind: Job
metadata:
  name: kbuild
spec:
  template:
    spec:
      containers:
      - name: kbuild
        image: egernst/kernel-build
        command: ["bash"]
        args: ["-c", "make olddefconfig && make -j4"]
        resources:
          requests:
            cpu: 1
            memory: 5Gi
          limits:
            cpu: 5
            memory: 10Gi
      restartPolicy: Never
      runtimeClassName: kata-clh

@amshinde
Copy link
Member

@likebreath @egernst Have we tried against the stable 1.11.2 release to check id we see the issue?

@egernst
Copy link
Member

egernst commented Jul 30, 2020

@amshinde I am unable to reproduce with 1.11.2

@likebreath
Copy link
Contributor Author

A quick update.

@jcvenegas and I located the root cause of the VmInfoGet failure from PR #2840, which turned to be an issue related to the seccomp capability enforced by cloud-hypervisor (CLH) itself. The http_servier thread is killed silently (w/ signal SIGSYS) by the kernel (auditd) when a syscall is used but not listed and allowed in the seccomp filter list. Basically, the workload created by @jcvenegas managed to trigger a new syscall (a.k.a mprotect) from the http-server thread of CLH (while all workload from CLH CI is not triggering this syscall).

@jcvenegas has submitted a patch to CLH to fix the issue (cloud-hypervisor/cloud-hypervisor#1548), and the patch will be included in the clh v0.9.0 (coming out tomorrow). Hopefully, the patch will be enough to cover system calls required for workloads of using clh+kata (w/ virtiofsd 5.0). Note that, similar silent failures can be triggered with different workloads (that can trigger new syscalls), which can be the reason of random/sporadic failures we see from our kata CI.

@sboeuf
Copy link

sboeuf commented Aug 6, 2020

@likebreath @jcvenegas

Note that, similar silent failures can be triggered with different workloads (that can trigger new syscalls), which can be the reason of random/sporadic failures we see from our kata CI.

Could you document how to quickly check and identify this kind of issue? The goal is to avoid wasting too much time next time we might run into this issue and quickly identify the missing syscall.

jcvenegas added a commit to jcvenegas/runtime that referenced this issue Aug 6, 2020
Highlights for cloud-hypervisor version 0.9.0 include:
virtiofs updates to new dax implementation based in qemu 5.0
Fixed random issues caused due to seccomp filters

io_uring Based Block Device Support

If the io_uring feature is enabled and the host kernel supports it then io_uring will be used for block devices. This results a very significant performance improvement.
Block and Network Device Statistics

Statistics for activity of the virtio network and block devices is now exposed through a new vm.counters HTTP API entry point. These take the form of simple counters which can be used to observe the activity of the VM.
HTTP API Responses

The HTTP API for adding devices now responds with the name that was assigned to the device as well the PCI BDF.
CPU Topology

A topology parameter has been added to --cpus which allows the configuration of the guest CPU topology allowing the user to specify the numbers of sockets, packages per socket, cores per package and threads per core.
Release Build Optimization

Our release build is now built with LTO (Link Time Optimization) which results in a ~20% reduction in the binary size.
Hypervisor Abstraction

A new abstraction has been introduced, in the form of a hypervisor crate so as to enable the support of additional hypervisors beyond KVM.
Snapshot/Restore Improvements

Multiple improvements have been made to the VM snapshot/restore support that was added in the last release. This includes persisting more vCPU state and in particular preserving the guest paravirtualized clock in order to avoid vCPU hangs inside the guest when running with multiple vCPUs.
Virtio Memory Ballooning Support

A virtio-balloon device has been added, controlled through the resize control, which allows the reclamation of host memory by resizing a memory balloon inside the guest.
Enhancements to ARM64 Support

The ARM64 support introduced in the last release has been further enhanced with support for using PCI for exposing devices into the guest as well as multiple bug fixes. It also now supports using an initramfs when booting.
Intel SGX Support

The guest can now use Intel SGX if the host supports it. Details can be found in the dedicated SGX documentation.
Seccomp Sandbox Improvements

The most frequently used virtio devices are now isolated with their own seccomp filters. It is also now possible to pass --seccomp=log which result in the logging of requests that would have otherwise been denied to further aid development.
Notable Bug Fixes

    Our virtio-vsock implementation has been resynced with the implementation from Firecracker and includes multiple bug fixes.
    CPU hotplug has been fixed so that it is now possible to add, remove, and re-add vCPUs (kata-containers#1338)
    A workaround is now in place for when KVM reports MSRs available MSRs that are in fact unreadable preventing snapshot/restore from working correctly (kata-containers#1543).
    virtio-mmio based devices are now more widely tested (kata-containers#275).
    Multiple issues have been fixed with virtio device configuration (kata-containers#1217)
    Console input was wrongly consumed by both virtio-console and the serial. (kata-containers#1521)

Fixes: kata-containers#2864

Signed-off-by: Jose Carlos Venegas Munoz <[email protected]>
jcvenegas added a commit to jcvenegas/runtime that referenced this issue Aug 11, 2020
Highlights for cloud-hypervisor version 0.9.0 include:
virtiofs updates to new dax implementation based in qemu 5.0
Fixed random issues caused due to seccomp filters

io_uring Based Block Device Support

If the io_uring feature is enabled and the host kernel supports it then io_uring will be used for block devices. This results a very significant performance improvement.
Block and Network Device Statistics

Statistics for activity of the virtio network and block devices is now exposed through a new vm.counters HTTP API entry point. These take the form of simple counters which can be used to observe the activity of the VM.
HTTP API Responses

The HTTP API for adding devices now responds with the name that was assigned to the device as well the PCI BDF.
CPU Topology

A topology parameter has been added to --cpus which allows the configuration of the guest CPU topology allowing the user to specify the numbers of sockets, packages per socket, cores per package and threads per core.
Release Build Optimization

Our release build is now built with LTO (Link Time Optimization) which results in a ~20% reduction in the binary size.
Hypervisor Abstraction

A new abstraction has been introduced, in the form of a hypervisor crate so as to enable the support of additional hypervisors beyond KVM.
Snapshot/Restore Improvements

Multiple improvements have been made to the VM snapshot/restore support that was added in the last release. This includes persisting more vCPU state and in particular preserving the guest paravirtualized clock in order to avoid vCPU hangs inside the guest when running with multiple vCPUs.
Virtio Memory Ballooning Support

A virtio-balloon device has been added, controlled through the resize control, which allows the reclamation of host memory by resizing a memory balloon inside the guest.
Enhancements to ARM64 Support

The ARM64 support introduced in the last release has been further enhanced with support for using PCI for exposing devices into the guest as well as multiple bug fixes. It also now supports using an initramfs when booting.
Intel SGX Support

The guest can now use Intel SGX if the host supports it. Details can be found in the dedicated SGX documentation.
Seccomp Sandbox Improvements

The most frequently used virtio devices are now isolated with their own seccomp filters. It is also now possible to pass --seccomp=log which result in the logging of requests that would have otherwise been denied to further aid development.
Notable Bug Fixes

    Our virtio-vsock implementation has been resynced with the implementation from Firecracker and includes multiple bug fixes.
    CPU hotplug has been fixed so that it is now possible to add, remove, and re-add vCPUs (kata-containers#1338)
    A workaround is now in place for when KVM reports MSRs available MSRs that are in fact unreadable preventing snapshot/restore from working correctly (kata-containers#1543).
    virtio-mmio based devices are now more widely tested (kata-containers#275).
    Multiple issues have been fixed with virtio device configuration (kata-containers#1217)
    Console input was wrongly consumed by both virtio-console and the serial. (kata-containers#1521)

Fixes: kata-containers#2864

Signed-off-by: Jose Carlos Venegas Munoz <[email protected]>
@amshinde
Copy link
Member

amshinde commented Aug 13, 2020

Note that, similar silent failures can be triggered with different workloads (that can trigger new syscalls), which can be the reason of random/sporadic failures we see from our kata CI.

@likebreath @jcvenegas Can we throw an explicit error in that case which clearly shows that the system call was not allowed by seccomp?
Looking at the seccomp man page, there are several actions that can be triggered by a seccomp filter. Looks like in case of cloud-hypervisor, the action chosen for disallowed system call is SECCOMP_RET_TRAP which throws a SIGSYS signal. Is there a way we can catch this signal and return an appropriate error message, so that we do not waste time trying to figure out the failure reason next time?
If signal handling is an issue, I think we can make use of the SECCOMP_RET_ERRNO action, which returns an errno that is passed to the filter in SECCOMP_RET_DATA.

Wdyt @sboeuf @rbradford ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Incorrect behaviour needs-review Needs to be assessed by the team.
Projects
None yet
5 participants