Skip to content
This repository has been archived by the owner on Jun 28, 2024. It is now read-only.

On-boarding Kata Containers into OpenShift CI #2802

Merged
merged 5 commits into from
Sep 11, 2020

Conversation

wainersm
Copy link
Contributor

This is an initial implementation on kata-containers's side of the requirements to get it on-boarded into the OpenShift CI. On the README file added on this PR there are more detailed technical information about the openshift-ci and how the scripts/files proposed on this PR are used. On openshift-ci's side there will be needed to have openshift/release#9009 merged too. Please, refer to that README (and the openshift-ci documentation pointers it provides) for further information.

In the issue #2527 I describe the ultimate goal with these changes which is to run e2e openshift 4 tests with latest kata-containers code. However this initial implementation provides only a smoke test suite that I used for the purpose of testing the CI code itself. So once this initial backbone is merged I will be able to work on the execution of e2e tests properly saying.

On my development cluster the proposed here pipeline installs kata correctly and the smoke tests pass. However, on the openshift-ci environment the pipeline seems to install kata correctly but the smoke test fails. I've debugged it but I don't have a solution so far and the problem seems alike cri-o/cri-o#2984 .

Also there is a known issue (and a colleague is helping me to solve):

  • Long story short: it fails to build osbuilder because the scripts use docke, instead we will need to use the Dracut method for osbuilder.
  • The long story is: the build_install.sh basically exports the variables to produce the desired kata installation (under runtime/_out/build_install) then calls the tests/.ci/Install_kata.sh script. It tries to fetch pre-built artifacts from Jenkins and when the cache fails it attempts to build them locally. The problem is that in openshift-ci build environment it is not allowed to evoke docker because it is already in a containerized context (i.e. no nested containers allowed).

Regardless the limitations and known issues I explained before, I would like to have a general review on this PR because I've spent too much time on the code and likely there are stuffs that I missed (and I don't want to spend much more time fixing it :)

@wainersm wainersm requested a review from GabyCT August 19, 2020 23:24
@wainersm wainersm requested a review from a team as a code owner August 19, 2020 23:24
@wainersm wainersm requested a review from chavafg August 19, 2020 23:26
wainersm added a commit to wainersm/kc-runtime that referenced this pull request Aug 19, 2020
Enable openshift-ci on runtime repository.

Fixes: kata-containers#2527
Depends-on: github.com/kata-containers/tests#2802

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
wainersm added a commit to wainersm/kc-runtime that referenced this pull request Aug 19, 2020
Enable openshift-ci on runtime repository.

Fixes: github.com/kata-containers/tests#2527
Depends-on: github.com/kata-containers/tests#2802

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
@wainersm wainersm force-pushed the enable_openshift-ci branch from 0414d9f to ba2695f Compare August 20, 2020 14:52
@wainersm
Copy link
Contributor Author

@GabyCT @chavafg Hi! kata-check-markdown complain about the following but I don't know how to fix. Could you help here?

ERROR: Document .ci/openshift-ci/README.md is not referenced

@chavafg
Copy link
Contributor

chavafg commented Aug 21, 2020

Thanks @wainersm for this PR. To solve the not referenced issue, you can create a reference in the main README.md to the .ci/openshift-ci/README.md file. Maybe adding a small description of the OpenShift CI in the CI Content section of the main readme.
wdyt @jodh-intel ?

@wainersm
Copy link
Contributor Author

Thanks @wainersm for this PR. To solve the not referenced issue, you can create a reference in the main README.md to the .ci/openshift-ci/README.md file. Maybe adding a small description of the OpenShift CI in the CI Content section of the main readme.

Thanks @chavafg ! I've added an entry on .ci/README.md 's scripts table that made this error away.

There is another issue however. On my readme I am referencing urls that should be available only when openshift/release#9009 gets merged. It is a chicken-and-eggs problem. How usually you guys overcome this?

wainersm added a commit to wainersm/kc-runtime that referenced this pull request Aug 21, 2020
Enable openshift-ci on runtime repository.

Fixes: github.com/kata-containers/tests#2527
Depends-on: github.com/kata-containers/tests#2802

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
@jodh-intel
Copy link
Contributor

Hi @wainersm - thanks for raising! Travis is failing as the new file isn't referenced by any others:

ERROR: Document .ci/openshift-ci/README.md is not referenced

Please add a link to the new file in the .ci/README.md file for example (or the top-level README.md?)

Note that looking at the doc, there are some words our spell checker will complain about. Once you've re-pushed Travis will run that for you, but you might want to run the spell checker locally until you get a clean pass.

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising @wainersm. A few comments and suggestions...

@@ -0,0 +1,63 @@
# OpenShift CI files

The OpenShift CI (openshift-ci) allows for components to be built and tested in clusters provided by the OpenShift community. An entitled component should follow the [onboarding process](https://github.com/openshift/ci-tools/blob/master/ONBOARD.md) which involve its configuration in the [ci-operator](https://github.com/openshift/ci-operator/blob/master/README.md). So this directory contains scripts and files used by the OpenShift ci-operator to test kata-containers in that CI environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: very long lines, which makes diff harder to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm re-writing the README to keep the lines under 80-chars limit.


The ci-operator behavior is defined in yaml files which are placed at the [config directory](https://github.com/openshift/release/tree/master/ci-operator/config) in a per-component/repository/branch basis. This directory is organized such as the operator can infer the components and their repositories that should be tested. And the yaml configuration files should be named in a way that it knows the branch as well.

Therefore, there is the *kata-containers* directory (find it [here](https://github.com/openshift/release/tree/master/ci-operator/config/kata-containers)) to host the configuration files for our project. The configuration files are named as `kata-containers/<repository>/kata-containers-<repository>-<branch>.yaml`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spell checker won't like this I think so I'd suggest changing to kata-containers/.

@@ -0,0 +1,63 @@
# OpenShift CI files

The OpenShift CI (openshift-ci) allows for components to be built and tested in clusters provided by the OpenShift community. An entitled component should follow the [onboarding process](https://github.com/openshift/ci-tools/blob/master/ONBOARD.md) which involve its configuration in the [ci-operator](https://github.com/openshift/ci-operator/blob/master/README.md). So this directory contains scripts and files used by the OpenShift ci-operator to test kata-containers in that CI environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ci-operator" isn't a dictionary word to you either need to add it to our spell checker dictionary (see https://github.com/kata-containers/tests/tree/master/cmd/check-spelling), or -- simpler -- just refer to it as "CI operator" instead maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I changed the README to use "CI operator" instead.


By the means of [templates](https://github.com/openshift/release/tree/master/ci-operator/templates) the ci-operator allows for preparing the test environment. Commonly those templates are made to instruct the operator on how to deploy and configure OpenShift in a given cloud provider, then run end-to-end tests.

In special for kata-containers the used template needs to enable nested virtualization in the test cluster. See the template currently used in the kata-containers's ci-operator configuration file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project should always be referred to as "Kata Containers" in documentation. But here I think you are referring to the directory? In which case, put it in back-ticks: kata-containers/.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-wrote the README to use the project's name correctly as "Kata Containers".

ds_pods=($(oc get pods | awk '{if ($1 ~ "kata-deploy") { print $1 } }'))
cnt=5
while [[ ${#ds_pods[@]} -gt 0 && $cnt -ne 0 ]]; do
sleep 120
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a long time; does it need to be this long?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this script in OpenShift CI many times, the kata-deploy daemonset never finished before 2min. Usually it takes 3 iterations to finish (~6 min). So I don't think 2 min on this sleep is a long time.

sleep 120
for i in ${!ds_pods[@]}; do
info "Check daemonset ${ds_pods[i]} is running"
rc=$(oc exec ${ds_pods[i]} -- cat /tmp/kata_install_status 2> /dev/null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: odd indentation (tabs vs. spaces?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewrite the script to fix those tabs vs spaces and long-than-80-chars lines. Thanks!

oc logs ${ds_pods[i]}
unset ds_pods[i]
else
info "Running"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need a break here. I check the number of running ds_pods in the begin of the loop.

oc describe pods $p
oc logs $p
done
# TODO: exit here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve and remove the comment, or add a ref to a GitHub issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# Apply the CRI-O configuration
info "Configuring kata runtime for CRI-O"
if [ -z "$KATA_CRIO_CONF_BASE64" ]; then
export KATA_CRIO_CONF_BASE64=$(echo $(cat $configs_dir/crio_kata.conf|base64) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look a bit like lisp :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I found it a little bit uncommon (so to speak). But I wanted the config files versioned in plain-text so that any change would be properly visualized with diff. So the only solution I've found was to convert from text to base64 on run-time. Perhaps the if [ -z "$KATA_CRIO_CONF_BASE64" ]; then is not needed....

wainersm added a commit to wainersm/kc-runtime that referenced this pull request Aug 24, 2020
Enable openshift-ci on runtime repository.

Fixes: github.com/kata-containers/tests#2527
Depends-on: github.com/kata-containers/tests#2802

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
@wainersm wainersm force-pushed the enable_openshift-ci branch from ba2695f to 83c1aa5 Compare August 26, 2020 12:50
@wainersm wainersm changed the title Onboarding Kata-containers into OpenShift CI On-boarding Kata-containers into OpenShift CI Aug 26, 2020
@wainersm wainersm changed the title On-boarding Kata-containers into OpenShift CI On-boarding Kata Containers into OpenShift CI Aug 26, 2020
@wainersm
Copy link
Contributor Author

Hi @jodh-intel , thanks for your careful initial review! I tried to address as much as possible the comments you made on this new version. One important thing I added was the logic to wait the worker nodes to reboot after a change in the system configuration.

I see that Travis warns me about missing license on some files, I will address it in a following version.

v1 -> v2:

  • run_smoke_test.sh
    • Correctly handle waitForProcess call
    • Increased timeout to 600s
    • Broke long lines
  • cluster/install_kata.sh
    • Added the code to wait the workers to reboot (wait_for_reboot())
    • Fixed tab vs whitespace
    • Broke long lines
  • README.md
    • Added toc
    • Fix 'readme not referenced' error
    • Fixed invalid urls
    • Now spell-checker likes me
  • build_install.sh

@wainersm wainersm force-pushed the enable_openshift-ci branch from 83c1aa5 to 5ec7e70 Compare August 27, 2020 00:53
@wainersm
Copy link
Contributor Author

v2 -> v3:

  • Added missing license header

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wainersm. A few more comments.

These files can be automatically generated from the CI operator configuration
files as follows:

```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You could specify ```sh or ```bash to get syntax highlighting here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


1. `pipeline:src` - wraps the source repository
* The source repository is copied to `$KATA_SRC/<repo>` in the image. Assume
that `$KATA_SRC` is `/go/github.com/src/kata-containers`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: missing periods at the end of the full sentences in these bullets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


## Install Kata Containers

In this stage the test cluster should get Kata Containers installed in. In other
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/In other to achieve/In order to achieve/ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yet another good catch. thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

to the test cluster. Then the `$KATA_SRC/tests/.ci/openshift-ci/test.sh` script
is called, but it delegates the installation process to the `$KATA_SRC/tests/.ci/openshift-ci/cluster/install_kata.sh` script.

In a nutshell, it is used a Kubernetes *DaemonSet* to install Kata Containers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a nutshell

🌰 This might be confusing for non-native English speakers 🌰 <-- That's a nut folks, not what you might be thinking! 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehehehehe Good to know. I always assumed 'in a nutshell' were an expression widely understood in technical communities. Maybe because the first computer books I got access were the ' in a nutshell' series of O'REILLY? Oh, I am getting old... :D

# Wait all worker nodes reboot.
#
# Params:
# $1 - timeout in seconds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (defensive programming): We try to check params at the start of a function:

wait_for_reboot() {
    local timeout="${1:-}"

    [ -z "$timeout" ] && die "need timeout"

    # ...

}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to change to local delta="${1:-1800}". So I don't need to pass a repeated value on the two calls made in this script, yet let the function configurable for future usage.

$(cat $configs_dir/crio_kata.conf|base64) | sed -e 's/\s//g')
fi
envsubst < ${deployments_dir}/machineconfig_kata_runtime.yaml.in | oc apply -f -
if [ $(oc get machineconfigs|grep 50-kata|wc -l) != "0" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line looks potentially fragile - where is 50-kata defined and is it guaranteed to ever change? Maybe "surface" it by creating a readonly variable at the top of the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 50-kata is defined inside ${deployments_dir}/machineconfig_kata_runtime.yaml.in. I just learned that I can get the return code of oc get -f ${deployments_dir}/machineconfig_kata_runtime.yaml.in instead.

envsubst < ${deployments_dir}/machineconfig_selinux.yaml.in | oc apply -f -
oc get machineconfig/51-kata-selinux || die "SELinux machineconfig not found"
# The new SELinux configuration will trigger another reboot.
wait_for_reboot 1800
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: duplicated timeout value - maybe define a variable at the top of the file to avoid duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Now the wait_for_reboot has 1800 as default timeout.

The install_runtime.sh can configure Docker with the kata runtime. In
some circumstances Docker is not installed and so the script fail. Let's
avoid it by checking if Docker is installed.

Fixes kata-containers#2600

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
wainersm added a commit to wainersm/kc-runtime that referenced this pull request Sep 1, 2020
Enable openshift-ci on runtime repository.

Fixes: github.com/kata-containers/tests#2527
Depends-on: github.com/kata-containers/tests#2802

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
@wainersm wainersm force-pushed the enable_openshift-ci branch from 5ec7e70 to df028f0 Compare September 1, 2020 17:41
@wainersm
Copy link
Contributor Author

wainersm commented Sep 1, 2020

@jodh-intel Hi James, version 4 just came out of the oven ;)

v3 -> v4:

  • README.md
    • All suggestions made by James
  • build_install.sh
    • Using the standard QEMU. It was using the experimental (i.e. with
      virtiofs) but in OCP 4.5 the host kernel does not match the minimum version
      required.
  • cluster/install_kata.sh
    • Set default timeout for wait_for_reboot()
    • Removed fragile if [ $(oc get machineconfigs|grep 50-kata|wc -l) != "0" ]; then statement

wainersm added a commit to wainersm/kc-runtime that referenced this pull request Sep 1, 2020
Enable openshift-ci on runtime repository.

Fixes: github.com/kata-containers/tests#2527
Depends-on: github.com/kata-containers/tests#2802

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
@wainersm
Copy link
Contributor Author

wainersm commented Sep 3, 2020

/cc @fidencio

export DESTDIR="$1"
info "Build and install Kata Containers at ${DESTDIR}"

[ "$ID" != "centos" ] && die "Expect the build root to be CentOS"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want to add a comment here explaining why you expect CentOS. I had the same reaction as @jodh-intel when I saw that test.

# The build root container has already golang installed, let's remove it
# so that it will use the version required by kata.
yum remove -y golang
"${cidir}/install_go.sh" -p -f
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions about this

  1. What happens if you try to use the CentOS golang? (just curiosity on my part)
  2. The install script installs go using curl. Should you add yum install curl for safety where you also install sudo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions about this

1. What happens if you try to use the CentOS golang? (just curiosity on my part)

Good question. I don't remember If I ever tried the CentOS golang. What I remember is that something used to fail if the environment had the CentOS golang package installed.

The fact is that I tried to adhere this build script as much as possible with the process used in all the Kata Jenkins jobs. And they don't use the distro's golang, instead it is installed the version suggested (runtime/versions.yaml) via tests/.ci/install_go.sh.

2. The install script installs go using `curl`. Should you add `yum install curl` for safety where you also install `sudo`?

Yes, maybe. In the CentOS image used as build root container has curl installed so I didn't bothered.

# The default /usr prefix makes the deployment on Red Hat CoreOS (rhcos) more
# complex because that directory is read-only by design. Another prefix than
# /opt/kata is problematic too because QEMU experimental eventually got from
# Kata Containers Jenkins is uncompressed in that directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this sentence. Even grammatically.

Do you mean that the CI also installs qemu, and that we need to add our stuff in the same spot (if so, why?). Is it because the prefix will be used for the configuration files, and passing /opt/kata "magically" generates configuration files that point to the right qemu? If so, it really needs to be explained. That's what I understood reading the sentence, but clearly. the sed below adjusts the configuration to point back to /usr/bin. so I'm a bit confused.

From a CI point of view, does that mean that what we test with the CI is not installed in the path where it would be installed on an RHCOS system? What are the implications, if any?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this sentence. Even grammatically.

Reading it again, I even didn't understand what I wrote. :)
I will rephrase it.

Do you mean that the CI also installs qemu, and that we need to add our stuff in the same spot (if so, why?). Is it because the prefix will be used for the configuration files, and passing /opt/kata "magically" generates configuration files that point to the right qemu? If so, it really needs to be explained. That's what I understood reading the sentence, but clearly. the sed below adjusts the configuration to point back to /usr/bin. so I'm a bit confused.

The outcome of buildall_install.sh is a full installation of Kata Containers under runtime/_out/build_install. This is later packaged in a container image along with a install script. By the use of Kubernetes DaemonSets, the content of runtime/_out/build_install is rsync'ed with the node's filesystem.

That said, some facts:

  1. By default Kata Containers build to /usr/ and PREFIX=/usr
  2. On the cluster node (rhcos):
  • /usr is read-only and should not be changed. (see rhcos architecture)
  • /opt is a symlink to /var/opt
  • /var is writable but, for the sake of support, it should be changed only by machineconfig objects.
  1. Kata CI jobs use statically built QEMU rather than of the distros. The CI script will simply fetch and uncompress the tarball in the test machine.
  1. Kata CI jobs also use built kernel (guest OS) and rootfs (initrd or ramdisk)
  • Alike QEMU those files are fetched from Jenkins and uncompressed in the system. Moreover, they will be uncompressed following the PREFIX value.
  1. Based on the value of PREFIX the CI scripts will configure Kata's configuration.toml so that it contain the right paths to QEMU (virtiofsd, if the case), Kernel and rootfs.

In buildall_install.sh script PREFIX is set to /opt/kata so that Kata is built to /opt/kata and then I avoid the fact 2 (/usr being readonly). The kernel and rootfs are properly copied to under /opt/kata (fact 4), the "standard" QEMU goes to /usr (fact 3) -- here is the problem -- and then the configuration.toml file gets mis-configured (fact 5). So I change the configuration.toml to fix that as you can see below.

You might ask:

Q: How do you install QEMU in /usr given it is readonly?
A: Dirty hack. Notice that on runtime/.ci/openshift-ci/images/entrypoint.sh script the /usr is re-mounted as read-write

Q: QEMU is static, so can't you simply copy the binary to /opt/kata/bin?
A: It is possible but there is another issue. It is built the firmwares as well, and the QEMU binary still expect them under /usr. Perhaps that can be fixed by changing to firmware location on configuration.toml too.

Q: Can't it be solved differently then?
A: Yes, if we have a "standard" QEMU built with /opt/kata prefix and cached in Jenkins the problem will likely to be over. Honestly, I plan to work on it after this PR gets merged.

Q: Why don't you use QEMU from rhcos?
A: It is trick and brings in a completely different string of issues. We can discuss that in details later but, briefly speaking, rhcos doesn't package QEMU. We could use, say, the CentOS package but on openshift 4.5 it doesn't install smoothly (conflicts with glusterfs....).

All in all, there are plenty of improvements and important decisions to be taken but I am deferring them to after I get this merged. Because I think what I propose here works pretty much well as the v0 to get Kata on-boarded on Openshift CI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good news. I thought it was only testing the experimental one.

In buildall_install.sh script PREFIX is set to /opt/kata so that Kata is built to /opt/kata and then I avoid the fact 2 (/usr being readonly). The kernel and rootfs are properly copied to under /opt/kata (fact 4), the "standard" QEMU goes to /usr (fact 3) -- here is the problem -- and then the configuration.toml file gets mis-configured (fact 5). So I change the configuration.toml to fix that as you can see below.

A meta Q about that. If RHCOS is supposed to be a read-only system, why does it let you install stuff in /opt at all? 😃

Q: How do you install QEMU in /usr given it is readonly?
A: Dirty hack. Notice that on runtime/.ci/openshift-ci/images/entrypoint.sh script the /usr is re-mounted as read-write

That part I was aware of, but it's definitely useful to mention it in the comments (it's not obvious at all IMO to someone not familiar with RHCOS)

Q: QEMU is static, so can't you simply copy the binary to /opt/kata/bin?
A: It is possible but there is another issue. It is built the firmwares as well, and the QEMU binary still expect them under /usr. Perhaps that can be fixed by changing to firmware location on configuration.toml too.

A longer term solution might be to mount some subset of the container root fs into the target location (to avoid copies while preserving the right paths)

Q: Can't it be solved differently then?
A: Yes, if we have a "standard" QEMU built with /opt/kata prefix and cached in Jenkins the problem will likely to be over. Honestly, I plan to work on it after this PR gets merged.

Sounds good.

Q: Why don't you use QEMU from rhcos?
A: It is trick and brings in a completely different string of issues. We can discuss that in details later but, briefly speaking, rhcos doesn't package QEMU. We could use, say, the CentOS package but on openshift 4.5 it doesn't install smoothly (conflicts with glusterfs....).

That's another Q I did not have personally, but that would be a valuable addition to the commit message.

All in all, there are plenty of improvements and important decisions to be taken but I am deferring them to after I get this merged. Because I think what I propose here works pretty much well as the v0 to get Kata on-boarded on Openshift CI.

I think that your Q / A is an excellent addition to the commit msg. Thanks a bunch.

fi
done

# The standard QEMU cached in Jenkins is built with /usr/bin prefix, thus it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does that qemu come from? Is that the one built with kata, which we are trying to get rid of?

Shouldn't we try to go with a yum install qemu and get the dependencies that way instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered those questions on previous reply.

The OpenShift CI allows for components to be built and tested in clusters
provided by the OpenShift community. This adds the minimum required on kata's side to get it onboarded in that CI system.

More precisely, it introduces the buildall_install.sh script which is used
to build and install kata-containers into a container image.

Fixes kata-containers#2527

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
@wainersm wainersm force-pushed the enable_openshift-ci branch from df028f0 to 26f9262 Compare September 8, 2020 14:20
@wainersm
Copy link
Contributor Author

wainersm commented Sep 8, 2020

v4 -> v5:

  • buildall_install.sh:
    • Comment on why it exists if not running in CentOS.

@wainersm
Copy link
Contributor Author

wainersm commented Sep 8, 2020

@jodh-intel , @c3d , what else should I do to get those changes merged?

@jodh-intel
Copy link
Contributor

@wainersm - thanks for updating. There are a number of unanswered questions on this PR though. I think they should all be relatively quick to resolve but please could you respond to them?

This adds the implementation of a mini-operator which is used to install
kata into the OpenShift test cluster.

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
This adds a "Hello World" suite which provides a smoke test for kata
and its installation in the test cluster.

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Add the README.md file.

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
@wainersm wainersm force-pushed the enable_openshift-ci branch from 26f9262 to 970a589 Compare September 8, 2020 23:31
@wainersm
Copy link
Contributor Author

wainersm commented Sep 8, 2020

v5 -> v6:

  • cluster/install_kata.sh
    • Replaced oc get machineconfig/50-kata to pass the yaml file

@wainersm
Copy link
Contributor Author

wainersm commented Sep 8, 2020

@wainersm - thanks for updating. There are a number of unanswered questions on this PR though. I think they should all be relatively quick to resolve but please could you respond to them?

Hi @jodh-intel , I scan through the question on this PR, and either I answered or change the code properly. Should I "resolve conversation" for every and each conversation? Sorry, I am not used to the review process for Kata... so I might be missing some steps.

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wainersm.

lgtm

@jodh-intel
Copy link
Contributor

/test

@wainersm
Copy link
Contributor Author

wainersm commented Sep 9, 2020

Thanks @wainersm.

lgtm

Thanks @jodh-intel ! I appreciated your careful review!
I will have a look at the jobs failing.

@wainersm
Copy link
Contributor Author

wainersm commented Sep 9, 2020

@devimc Hi Julio, since you reviewed the required changes on runtime side (kata-containers/runtime#2891) ...mind to review this one as well?

Copy link

@devimc devimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - thanks @wainersm

Copy link
Member

@fidencio fidencio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spoken with Wainer Yesterday and this part is the first thing we need merged.
Once it's merged, we can work on the OpenShift side and then it'll be good to actually start running it.

@wainersm, thanks a ton for the immense work done on this!

@GabyCT, @chavafg, I'd really feel more comfortable if one you would be merging this PR!

@GabyCT GabyCT merged commit 831d1c6 into kata-containers:master Sep 11, 2020
@jodh-intel
Copy link
Contributor

Thanks again for this @wainersm!

wainersm added a commit to wainersm/kc-runtime that referenced this pull request Sep 14, 2020
Enable openshift-ci on runtime repository.

Fixes: github.com/kata-containers/tests#2527
Depends-on: github.com/kata-containers/tests#2802

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants