virtcontainers: prepend a kata specific string to host cgroups path #1518

lifupan · 2019-04-11T07:57:09Z

prepend a kata specific string to oci cgroup path to form a different cgroup path, thus cAdvisor couldn't find kata containers cgroup path on host to prevent it from grabbing the stats data.

Fixes:#1488

Signed-off-by: lifupan [email protected]

lifupan · 2019-04-11T12:31:46Z

/test

lifupan · 2019-04-11T13:25:39Z

@egernst If we accept the fix, we should change the test cases for cgroup test.

egernst · 2019-04-11T13:31:25Z

I think this is the best option right now for fixing. Let's get the tests updated!

Thanks @lifupan

egernst · 2019-04-11T13:40:10Z

/cc @devimc @mcastelino @krsna1729

devimc · 2019-04-11T13:44:12Z

@mocolic does this change work for you? (I don't think so) 🤔

egernst · 2019-04-11T14:01:21Z

@devimc - can you clarify their assumptions on cgroups? How do they use our cgroups?

mocolic · 2019-04-11T14:13:07Z

@devimc
If this change doesn't honor the cgroup name we passed to kata, it will break the behavior, so won't work for us. Our resource governance implementation relies on honoring cgroup, as resource limits are set on cgroup level and expectation is that kata creates and activates a container inside this cgroup.

However, recent tests have shown another problem with this check-in - #1189
Please let me know if I should open a new issue for this and we can continue there.

I believe the problem is this code snippet in virtcontainers/cgroups.go - as this will govern containers even though they do not have constraints - which is not expected.

    // use a default constraint for sandboxes without cpu constraints
if period == uint64(0) && quota == int64(0) {
	// set a quota and period equal to the default number of vcpus
	quota = int64(s.config.HypervisorConfig.NumVCPUs) * 100000
	period = 100000
}

In our clusters default number of vcpus is -1 (entire machine) and as parent cgroup can’t have stricter limits than the child cgroup, container activation fails with this error:

"Could not update cgroup /fabric/SingleInstance_0_App1:myCpuServicePkg@3016e3dc-60b9-8746-a897-4f1bb73dd80a@0c665905-47af-8147-b616-8a0c099ca7a5/crio-5e4f661e9035b89d190a34bcec617a75ac0006b47b784dab207f02bda98736d1: write /sys/fs/cgroup/cpu/fabric/SingleInstance_0_App1:myCpuServicePkg@3016e3dc-60b9-8746-a897-4f1bb73dd80a@0c665905-47af-8147-b616-8a0c099ca7a5/crio-5e4f661e9035b89d190a34bcec617a75ac0006b47b784dab207f02bda98736d1/cpu.cfs_quota_us: invalid argument"
Without the code snippet above, our resource governance would have worked out of the box.

devimc · 2019-04-11T14:36:10Z

@mocolic

If this change doesn't honor the cgroup name we passed to kata, it will break the behavior, so won't work for us. Our resource governance implementation relies on honoring cgroup, as resource limits are set on cgroup level and expectation is that kata creates and activates a container inside this cgroup.

yes, with this change the cgroup path won't be honoured (again)

However, recent tests have shown another problem with this check-in - #1189
Please let me know if I should open a new issue for this and we can continue there.

good finding, please file an issue

mocolic · 2019-04-11T14:59:13Z

@devimc

However, recent tests have shown another problem with this check-in - #1189
Please let me know if I should open a new issue for this and we can continue there.

good finding, please file an issue

Thanks, opened #1521

egernst · 2019-04-11T15:07:45Z

This forces a pretty tight coupling. As a data point; in Kubernetes, the node level part of the orchestrator/manager (kubelet) will create a cgroups (kubepod) under which it will place containers (in this specific instance, a pod cgroup). Kubelet is able to then go ahead and modify and restrict/enforce by modifying the cgroup (kubepod) that it created.

In the other scenario (@mocolic), the upper layer relies on Kata creating container cgroups which the upper layer can then go and modify. This makes a bit of a tight coupling, as can be seen in this scenario.

I’m not sure what the right solution is yet, but the coupling isn’t ideal.

mocolic · 2019-04-11T15:20:22Z

@egernst

In the other scenario (@mocolic), the upper layer relies on Kata creating container cgroups which the upper layer can then go and modify. This makes a bit of a tight coupling, as can be seen in this scenario.

So, we create our cgroup and change limits on that level, we do not modify kata child cgroups. The only what we rely on is that container will be under this parent cgroup (as long as container cgroup is under /fabric/, the child hierarchy doesn't matter to us). Also, all other container runtimes(docker/clear containers) work in this way.
On the other hand, with this solution, there is no way to create a container that is ungoverned, right? It is implicitly governed on the whole machine, even if the customer don't want that.

egernst · 2019-04-11T15:48:37Z

can you provide a complete hierarchy example @mocolic?

It sounds like the name of the cgroup we create doesn't matter, so long as we place it in the correct place in the hierarchy, correct?

With this PR, we place it in the same exact location, but provide a slightly modified name.

mocolic · 2019-04-11T16:01:57Z

@egernst

can you provide a complete hierarchy example @mocolic?

It sounds like the name of the cgroup we create doesn't matter, so long as we place it in the correct place in the hierarchy, correct?

Correct.
If we pass cgroup name to kata, for example fabric, the hierarchy should look like:
/sys/fs/cgroup/cpu,cpuacct
└── fabric
├── crio-89e60b84 # Sandbox cgroupsPath=/sys/fs/cgroup/cpu,cpuacct/fabric/crio-89e60b84
├── crio-2a95e0f8 # Container A cgroupsPath=/sys/fs/cgroup/cpu,cpuacct/fabric/crio-2a95e0f8
└── crio-d037bce7 # Container B cgroupsPath=/sys/fs/cgroup/cpu,cpuacct/fabric/crio-d037bce7

The hierarchy if parent cgroup is not provided, I expect something like this (or whatever naming convention you have)
/sys/fs/cgroup/cpu,cpuacct
├── crio-89e60b84 # Sandbox cgroupsPath=/sys/fs/cgroup/cpu,cpuacct/crio-89e60b84
├── crio-2a95e0f8 # Container A cgroupsPath=/sys/fs/cgroup/cpu,cpuacct/crio-2a95e0f8
└── crio-d037bce7 # Container B cgroupsPath=/sys/fs/cgroup/cpu,cpuacct/crio-d037bce7

One of the examples for the problem:
We set limits on fabric cgroup (for example 1.3 cores which translates to cpu.cfs_quota_us = 130000). Let's say that machine has 2 cores (and as in our configuration default_vcpus=-1), the code snippet above will try to set limits of cpu.cfs_quota_us = 200000 on child cgroup fabric/crio-d037bce7. Linux doesn't allow this, so container activation fails.

devimc · 2019-04-11T16:27:24Z

@mocolic

One of the examples for the problem:
We set limits on fabric cgroup (for example 1.3 cores which translates to cpu.cfs_quota_us = 130000). Let's say that machine has 2 cores (and as in our configuration default_vcpus=-1), the code snippet above will try to set limits of cpu.cfs_quota_us = 200000 on child cgroup fabric/crio-d037bce7. Linux doesn't allow this, so container activation fails.

then the constraint for the container should be -1 (no constraint) right?

mocolic · 2019-04-11T16:36:10Z

@mocolic

One of the examples for the problem:
We set limits on fabric cgroup (for example 1.3 cores which translates to cpu.cfs_quota_us = 130000). Let's say that machine has 2 cores (and as in our configuration default_vcpus=-1), the code snippet above will try to set limits of cpu.cfs_quota_us = 200000 on child cgroup fabric/crio-d037bce7. Linux doesn't allow this, so container activation fails.

then the constraint for the container should be -1 (no constraint) right?

When I checked the configuration.toml file, it says this:
#Default number of vCPUs per SB/VM:
#< 0 --> will be set to the actual number of physical cores

If I understood correctly, this means that you would internally change this to the entire core machine (to 2), right? Also, when I created a container that requires whole machine (2 cores are set on fabric cgroup), the container activation was successful.

devimc · 2019-04-11T16:42:26Z

@mocolic

If I understood correctly, this means that you would internally change this to the entire core machine (to 2), right?

yes, currently it's now working because we try to set a constraint equal to the number of physical resources (in this case 2), but that won't be possible if the parent cgroup is smaller (for example 1.3), so we shouldn't apply a constraint

mcastelino · 2019-04-11T17:08:30Z

@mocolic as Kata really does not (and cannot really) impose container level cgroups unless CPU Sets are used can you expand a little bit on what your container level cgroups is expected to have. We will place the kata-shim in this cgroup, but the actual workloads will be in what we will designate as the pod cgroup. So any stats or constraints you apply on the container cgroup will apply to the shim and not the actual workload. Do you place anything else in the container cgroup?

└── fabric
├── crio-89e60b84 # Sandbox cgroupsPath=/sys/fs/cgroup/cpu,cpuacct/fabric/crio-89e60b84
├── crio-2a95e0f8 # Container A cgroupsPath=/sys/fs/cgroup/cpu,cpuacct/fabric/crio-2a95e0f8
└── crio-d037bce7 # Container B cgroupsPath=/sys/fs/cgroup/cpu,cpuacct/fabric/crio-d037bce7

egernst · 2019-04-12T01:31:37Z

@fupan approach looks good. Preprend would be better than append imo. Let’s fry the tests fixed.

lifupan · 2019-04-12T01:33:20Z

@fupan approach looks good. Preprend would be better than append imo. Let’s fry the tests fixed.

@egernst Got it, I'll change it to prepend and then change the testcases

prepend a kata specific string to oci cgroup path to form a different cgroup path, thus cAdvisor couldn't find kata containers cgroup path on host to prevent it from grabbing the stats data. Fixes:kata-containers#1488 Signed-off-by: lifupan <[email protected]>

Kata prepended a kata specific string to cgroup path name to prevend vAdvisor from picking stats data, otherwise, kubelet will use those "zero" data to override the stats data got from cri provider. Fixes:kata-containers#1461 Depends-on:github.com/kata-containers/runtime#1518 Signed-off-by: lifupan <[email protected]>

lifupan · 2019-04-12T07:27:22Z

The changes for tests is here: kata-containers/tests#1462
Let merge it first then trigger the tests for this PR.

devimc · 2019-04-12T12:08:25Z

/test

mcastelino · 2019-04-12T16:16:02Z

@chavafg can we have a test case for this using something along the lines of a pod such as

apiVersion: v1
kind: Pod
metadata:
  name: test-metrics
spec:
  runtimeClassName: kata
  containers:
  - name: cont-2cpu-400m
    image: busybox
    resources:
      limits:
        cpu: 2
        memory: "400Mi"
    command: ["md5sum"]
    args: ["/dev/urandom"]
  - name: cont-3cpu-200m
    image: busybox
    resources:
      limits:
        cpu: 3
        memory: "200Mi"
    command: ["md5sum"]
    args: ["/dev/urandom"]

And then check the actual results viakubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/test-metrics"

which should yeild pretty high CPU utilization

along the lines of

  "containers": [
    {
      "name": "cont-3cpu-200m",
      "usage": {
        "cpu": "398739040n",
        "memory": "744Ki"
      }
    },
    {
      "name": "cont-2cpu-400m",
      "usage": {
        "cpu": "200848120n",
        "memory": "852Ki"
      }
    }

virtcontainers: prepend a kata specific string to host cgroups path (cherry picked from commit d99693a)

virtcontainers: prepend a kata specific string to host cgroups path (cherry picked from commit d99693a) Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>

virtcontainers: prepend a kata specific string to host cgroups path (cherry picked from commit d99693a) Fixes: kata-containers#1488 Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>

…fixtop virtcontainers: prepend a kata specific string to host cgroups path (cherry picked from commit d99693a) Fixes: kata-containers#1488 Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>

lifupan force-pushed the fixtop branch from 4a30c4d to 228e203 Compare April 11, 2019 09:17

lifupan changed the title ~~virtcontainers: only create host cgroups for podsandbox in shimv2~~ virtcontainers: add a kata specific appendix to host cgroups path Apr 11, 2019

lifupan force-pushed the fixtop branch 3 times, most recently from a95202c to d608e93 Compare April 11, 2019 12:31

teawater approved these changes Apr 11, 2019

View reviewed changes

egernst approved these changes Apr 11, 2019

View reviewed changes

devimc added the do-not-merge label Apr 11, 2019

devimc removed the do-not-merge label Apr 11, 2019

lifupan force-pushed the fixtop branch from d608e93 to 1a1f93b Compare April 12, 2019 02:31

lifupan mentioned this pull request Apr 12, 2019

prepend a kata specific string to host cgroup path kata-containers/tests#1461

Closed

lifupan changed the title ~~virtcontainers: add a kata specific appendix to host cgroups path~~ virtcontainers: prepend a kata specific string to host cgroups path Apr 12, 2019

lifupan mentioned this pull request Apr 12, 2019

cgroup_test: prepend a kata specific string to host cgroup path kata-containers/tests#1462

Merged

devimc approved these changes Apr 12, 2019

View reviewed changes

devimc merged commit d99693a into kata-containers:master Apr 12, 2019

egernst mentioned this pull request Apr 12, 2019

RFC: update resource management in Kata #1535

Closed

4 tasks

egernst modified the milestones: cgoup-sprint, cgroup-sprint Apr 12, 2019

ganeshmaharaj added the stable-candidate label Apr 16, 2019

ganeshmaharaj pushed a commit to ganeshmaharaj/kata-runtime that referenced this pull request Apr 16, 2019

Merge pull request kata-containers#1518 from lifupan/fixtop

a429e65

virtcontainers: prepend a kata specific string to host cgroups path (cherry picked from commit d99693a)

This was referenced Apr 16, 2019

backport: 1.6.2 backport patches #1543

Closed

backport: 1.6.2 backport patches #1544

Merged

egernst mentioned this pull request Apr 16, 2019

Stable 1.5 backports #1546

Merged

gkhanna79 mentioned this pull request Apr 19, 2019

virtcontainers: inherit parent's CPU constraint #1530

Merged

lifupan deleted the fixtop branch August 22, 2019 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

virtcontainers: prepend a kata specific string to host cgroups path #1518

virtcontainers: prepend a kata specific string to host cgroups path #1518

lifupan commented Apr 11, 2019 •

edited

Loading

lifupan commented Apr 11, 2019

lifupan commented Apr 11, 2019

egernst commented Apr 11, 2019

egernst commented Apr 11, 2019

devimc commented Apr 11, 2019 •

edited

Loading

egernst commented Apr 11, 2019

mocolic commented Apr 11, 2019 •

edited

Loading

devimc commented Apr 11, 2019

mocolic commented Apr 11, 2019

egernst commented Apr 11, 2019 •

edited

Loading

mocolic commented Apr 11, 2019 •

edited

Loading

egernst commented Apr 11, 2019 •

edited

Loading

mocolic commented Apr 11, 2019 •

edited

Loading

devimc commented Apr 11, 2019

mocolic commented Apr 11, 2019 •

edited

Loading

devimc commented Apr 11, 2019

mcastelino commented Apr 11, 2019

egernst commented Apr 12, 2019

lifupan commented Apr 12, 2019 •

edited

Loading

lifupan commented Apr 12, 2019

devimc commented Apr 12, 2019

mcastelino commented Apr 12, 2019

virtcontainers: prepend a kata specific string to host cgroups path #1518

virtcontainers: prepend a kata specific string to host cgroups path #1518

Conversation

lifupan commented Apr 11, 2019 • edited Loading

lifupan commented Apr 11, 2019

lifupan commented Apr 11, 2019

egernst commented Apr 11, 2019

egernst commented Apr 11, 2019

devimc commented Apr 11, 2019 • edited Loading

egernst commented Apr 11, 2019

mocolic commented Apr 11, 2019 • edited Loading

devimc commented Apr 11, 2019

mocolic commented Apr 11, 2019

egernst commented Apr 11, 2019 • edited Loading

mocolic commented Apr 11, 2019 • edited Loading

egernst commented Apr 11, 2019 • edited Loading

mocolic commented Apr 11, 2019 • edited Loading

devimc commented Apr 11, 2019

mocolic commented Apr 11, 2019 • edited Loading

devimc commented Apr 11, 2019

mcastelino commented Apr 11, 2019

egernst commented Apr 12, 2019

lifupan commented Apr 12, 2019 • edited Loading

lifupan commented Apr 12, 2019

devimc commented Apr 12, 2019

mcastelino commented Apr 12, 2019

lifupan commented Apr 11, 2019 •

edited

Loading

devimc commented Apr 11, 2019 •

edited

Loading

mocolic commented Apr 11, 2019 •

edited

Loading

egernst commented Apr 11, 2019 •

edited

Loading

mocolic commented Apr 11, 2019 •

edited

Loading

egernst commented Apr 11, 2019 •

edited

Loading

mocolic commented Apr 11, 2019 •

edited

Loading

mocolic commented Apr 11, 2019 •

edited

Loading

lifupan commented Apr 12, 2019 •

edited

Loading