-
Notifications
You must be signed in to change notification settings - Fork 373
virtcontainers: prepend a kata specific string to host cgroups path #1518
Conversation
a95202c
to
d608e93
Compare
/test |
@egernst If we accept the fix, we should change the test cases for cgroup test. |
I think this is the best option right now for fixing. Let's get the tests updated! Thanks @lifupan |
@mocolic does this change work for you? (I don't think so) 🤔 |
@devimc - can you clarify their assumptions on cgroups? How do they use our cgroups? |
@devimc However, recent tests have shown another problem with this check-in - #1189 I believe the problem is this code snippet in virtcontainers/cgroups.go - as this will govern containers even though they do not have constraints - which is not expected.
In our clusters default number of vcpus is -1 (entire machine) and as parent cgroup can’t have stricter limits than the child cgroup, container activation fails with this error: "Could not update cgroup /fabric/SingleInstance_0_App1:myCpuServicePkg@3016e3dc-60b9-8746-a897-4f1bb73dd80a@0c665905-47af-8147-b616-8a0c099ca7a5/crio-5e4f661e9035b89d190a34bcec617a75ac0006b47b784dab207f02bda98736d1: write /sys/fs/cgroup/cpu/fabric/SingleInstance_0_App1:myCpuServicePkg@3016e3dc-60b9-8746-a897-4f1bb73dd80a@0c665905-47af-8147-b616-8a0c099ca7a5/crio-5e4f661e9035b89d190a34bcec617a75ac0006b47b784dab207f02bda98736d1/cpu.cfs_quota_us: invalid argument" |
yes, with this change the cgroup path won't be honoured (again)
good finding, please file an issue |
This forces a pretty tight coupling. As a data point; in Kubernetes, the node level part of the orchestrator/manager (kubelet) will create a cgroups (kubepod) under which it will place containers (in this specific instance, a pod cgroup). Kubelet is able to then go ahead and modify and restrict/enforce by modifying the cgroup (kubepod) that it created. In the other scenario (@mocolic), the upper layer relies on Kata creating container cgroups which the upper layer can then go and modify. This makes a bit of a tight coupling, as can be seen in this scenario. I’m not sure what the right solution is yet, but the coupling isn’t ideal. |
So, we create our cgroup and change limits on that level, we do not modify kata child cgroups. The only what we rely on is that container will be under this parent cgroup (as long as container cgroup is under /fabric/, the child hierarchy doesn't matter to us). Also, all other container runtimes(docker/clear containers) work in this way. |
can you provide a complete hierarchy example @mocolic? It sounds like the name of the cgroup we create doesn't matter, so long as we place it in the correct place in the hierarchy, correct? With this PR, we place it in the same exact location, but provide a slightly modified name. |
Correct. The hierarchy if parent cgroup is not provided, I expect something like this (or whatever naming convention you have) One of the examples for the problem: |
then the constraint for the container should be -1 (no constraint) right? |
When I checked the configuration.toml file, it says this: If I understood correctly, this means that you would internally change this to the entire core machine (to 2), right? Also, when I created a container that requires whole machine (2 cores are set on fabric cgroup), the container activation was successful. |
yes, currently it's now working because we try to set a constraint equal to the number of physical resources (in this case 2), but that won't be possible if the parent cgroup is smaller (for example 1.3), so we shouldn't apply a constraint |
@mocolic as Kata really does not (and cannot really) impose container level cgroups unless CPU Sets are used can you expand a little bit on what your container level cgroups is expected to have. We will place the kata-shim in this cgroup, but the actual workloads will be in what we will designate as the
|
@fupan approach looks good. Preprend would be better than append imo. Let’s fry the tests fixed. |
prepend a kata specific string to oci cgroup path to form a different cgroup path, thus cAdvisor couldn't find kata containers cgroup path on host to prevent it from grabbing the stats data. Fixes:kata-containers#1488 Signed-off-by: lifupan <[email protected]>
Kata prepended a kata specific string to cgroup path name to prevend vAdvisor from picking stats data, otherwise, kubelet will use those "zero" data to override the stats data got from cri provider. Fixes:kata-containers#1461 Depends-on:github.com/kata-containers/runtime#1518 Signed-off-by: lifupan <[email protected]>
Kata prepended a kata specific string to cgroup path name to prevend vAdvisor from picking stats data, otherwise, kubelet will use those "zero" data to override the stats data got from cri provider. Fixes:kata-containers#1461 Depends-on:github.com/kata-containers/runtime#1518 Signed-off-by: lifupan <[email protected]>
The changes for tests is here: kata-containers/tests#1462 |
/test |
@chavafg can we have a test case for this using something along the lines of a pod such as
And then check the actual results viakubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/test-metrics" which should yeild pretty high CPU utilization along the lines of
|
virtcontainers: prepend a kata specific string to host cgroups path (cherry picked from commit d99693a)
virtcontainers: prepend a kata specific string to host cgroups path (cherry picked from commit d99693a) Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>
virtcontainers: prepend a kata specific string to host cgroups path (cherry picked from commit d99693a) Fixes: kata-containers#1488 Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>
…fixtop virtcontainers: prepend a kata specific string to host cgroups path (cherry picked from commit d99693a) Fixes: kata-containers#1488 Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>
prepend a kata specific string to oci cgroup path to form a different cgroup path, thus cAdvisor couldn't find kata containers cgroup path on host to prevent it from grabbing the stats data.
Fixes:#1488
Signed-off-by: lifupan [email protected]