Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CPU affinity to executed processes #1253

Merged
merged 2 commits into from
Jun 25, 2024

Conversation

kolyshkin
Copy link
Contributor

This allows to set initial and final CPU affinity for a process being run in a container, which is needed to solve the issue described in opencontainers/runc#3922.

This is an alternative to #1252.

@kolyshkin kolyshkin force-pushed the exec-aff branch 2 times, most recently from 8179d10 to cb918d2 Compare May 19, 2024 22:27
@kolyshkin kolyshkin requested a review from AkihiroSuda May 19, 2024 22:27
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

utam0k

This comment was marked as duplicate.

@kolyshkin
Copy link
Contributor Author

@opencontainers/runtime-spec-maintainers PTAL (I'm going to merge this this week if there are no other comments)

Copy link

@bartwensley bartwensley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

config.md Outdated
on after the transition to container's cgroup. The format is either the
same as for `initial`, or a special value `cgroup` which means container's
CPU affinity, as defined by [cpu.cpus property](./config.md#configLinuxCPUs)).
If omitted or empty, the `initial` setting is kept.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be the opposite? I mean "if ommitted, cpu.cpus will be in effect"?
The initial seems to allow to set affinity to cpu cores that are outside of the set which is specified by cpu.cpus.
This also trigger the question of validation of the values between cpuAffinity and cpu.cpus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two questions here.

  1. What is the default if final is not set. You are probably right that cpu.cpus makes more sense. UPDATED.

  2. Whether to allow affinity that's outside of container's cpu.cpus. While it seems theoretically possible for the initial cpuset, I haven't checked that it works, and I don't want to add any specific constraints to the spec (or to runc/crun).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL @kad

Also, s/in/on/g.

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Contributor Author

@kad thanks, changed like this:

diff --git a/config.md b/config.md
index eab6eec..b9b5573 100644
--- a/config.md
+++ b/config.md
@@ -340,9 +340,10 @@ For Linux-based systems, the `process` object supports the following process-spe
 
     * **`class`** (string, REQUIRED) specifies the I/O scheduling class. Possible values are `IOPRIO_CLASS_RT`, `IOPRIO_CLASS_BE`, and `IOPRIO_CLASS_IDLE`.
     * **`priority`** (int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest).
-* **`cpuAffinity`** (object, OPTIONAL) specifies CPU affinity used to execute the process.
+* **`execCPUAffinity`** (object, OPTIONAL) specifies CPU affinity used to execute the process.
+    This setting is not applicable to the container's init process.
     The following properties are available:

-    * **`initial`** (string, REQUIRED) is a list of CPUs a runtime parent
+    * **`initial`** (string, OPTIONAL) is a list of CPUs a runtime parent
       process to be run on initially, before the transition to container's
       cgroup. This is a a comma-separated list, with dashes to represent
       ranges. For example, `0-3,7` represents CPUs 0,1,2,3, and 7.
@@ -427,7 +427,7 @@ _Note: symbolic name for uid and gid, such as uname and gname respectively, are
             "soft": 1024
         }
     ],
-    "cpuAffinity": {
+    "execCPUAffinity": {
         "initial": "7",
         "final": "0-3,7"
     }
diff --git a/schema/config-schema.json b/schema/config-schema.json
index 39cc71e..cb74342 100644
--- a/schema/config-schema.json
+++ b/schema/config-schema.json
@@ -221,11 +221,8 @@
                         }
                     }
                 },
-                "cpuAffinity": {
+                "execCPUAffinity": {
                     "type": "object",
-                    "required": [
-                        "initial"
-                    ],
                     "properties": {
                         "initial": {
                             "type": "string",
diff --git a/specs-go/config.go b/specs-go/config.go
index 00bf946..290b6dc 100644
--- a/specs-go/config.go
+++ b/specs-go/config.go
@@ -94,8 +94,8 @@ type Process struct {
        SelinuxLabel string `json:"selinuxLabel,omitempty" platform:"linux"`
        // IOPriority contains the I/O priority settings for the cgroup.
        IOPriority *LinuxIOPriority `json:"ioPriority,omitempty" platform:"linux"`
-       // CPUAffinity specifies CPU affinity for the process.
-       CPUAffinity *CPUAffinity `json:"cpuAffinity,omitempty" platform:"linux"`
+       // ExecCPUAffinity specifies CPU affinity for exec processes.
+       ExecCPUAffinity *CPUAffinity `json:"execCPUAffinity,omitempty" platform:"linux"`
 }
 
 // LinuxCapabilities specifies the list of allowed capabilities that are kept for a process.

Copy link
Contributor

@kad kad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for update. looks better now. I'm still not convinced about names initial and final, maybe something more descriptive can be done like execRuntime / execContainer for consistency with hooks?
But for that I'll let others to chime in.

* **`final`** (string, OPTIONAL) is a list of CPUs the process will be run
on after the transition to container's cgroup. The format is the same as
for `initial`. If omitted or empty, the container's default CPU affinity,
as defined by [cpu.cpus property](./config.md#configLinuxCPUs)), is used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the tests, if you don't call sched_setaffinity(), the kernel itself will reset it to cgroups cpuset.cpu, so, maybe here statement can be simplified, that if not specified, after transition into container namespace, default affinity depends on kernel implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is nicer to have a reference to another way of setting cpu affinity here. If removed, it still says the same thing:

       on after the transition to container's cgroup. The format is the same as
-      for `initial`. If omitted or empty, the container's default CPU affinity,
-      as defined by [cpu.cpus property](./config.md#configLinuxCPUs)), is used.
+      for `initial`. If omitted or empty, the container's default CPU affinity
+      applies.
 

so I think it's better to keep it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, ok for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the tests, if you don't call sched_setaffinity(), the kernel itself will reset it to cgroups cpuset.cpu

@kad It seems your tests are wrong. Mine are showing this only happens when the initial (pre cgroup-move) affinity is not a subset of cgroup's affinity.

So maybe we need to change the spec to say something like

If omitted or empty, no attempt to change the CPU affinity of the process after moving it to container's cgroup is made, and its final affinity is determined by the Linux kernel.

I'd rather not go into more details here, because otherwise I'll be documenting the kernel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my tests, it is not the case. (-p is "pause" where I'm attaching in another terminal process to cgroup with cpuset.cpus=0-10):

r640-2:/home/akanevsk # ~akanevsk/getaffinity -g -s "0-5" -g -p -g
get_affinity: 0-87
set_affinity: affinity set to 0-5
get_affinity: 0-5
get_affinity: 0-10
r640-2:/home/akanevsk # ~akanevsk/getaffinity -g -s "50-60" -g -p -g
get_affinity: 0-87
set_affinity: affinity set to 50-60
get_affinity: 50-60
get_affinity: 0-10

This allows to set initial and final CPU affinity for a process being
run in a container, which is needed to solve the issue described in [1].

[1] opencontainers/runc#3922

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin merged commit 7017384 into opencontainers:main Jun 25, 2024
3 checks passed
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jul 17, 2024
As per
 - opencontainers/runtime-spec#1253
 - opencontainers/runtime-spec#1261

Add some tests (alas it's impossible to test initial CPU affinity
without adding debug logging).

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Aug 16, 2024
As per
 - opencontainers/runtime-spec#1253
 - opencontainers/runtime-spec#1261

Add some tests (alas it's impossible to test initial CPU affinity
without adding debug logging).

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jan 8, 2025
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jan 8, 2025
As per
- opencontainers/runtime-spec#1253
- opencontainers/runtime-spec#1261

Add some tests (alas it's impossible to test initial CPU affinity
without adding debug logging).

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jan 9, 2025
As per
- opencontainers/runtime-spec#1253
- opencontainers/runtime-spec#1261

Add some tests (alas it's impossible to test initial CPU affinity
without adding debug logging).

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jan 10, 2025
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jan 10, 2025
As per
- opencontainers/runtime-spec#1253
- opencontainers/runtime-spec#1261

Add some tests (alas it's impossible to test initial CPU affinity
without adding debug logging).

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jan 15, 2025
As per
- opencontainers/runtime-spec#1253
- opencontainers/runtime-spec#1261

Add some tests (alas it's impossible to test initial CPU affinity
without adding debug logging to nsexec).

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jan 16, 2025
As per
- opencontainers/runtime-spec#1253
- opencontainers/runtime-spec#1261

CPU affinity can be set in two ways:
1. When creating/starting a container, in config.json's
   Process.ExecCPUAffinity, which is when applied to all execs.
2. When running an exec, in process.json's CPUAffinity, which
   applied to a given exec and overrides the value from (1).

Add some basic tests.

Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a
process to that of a container's cgroup, as soon as it is moved to that
cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that.

Because of the above,
 - it's impossible to really test initial CPU affinity without adding
   debug logging to libcontainer/nsenter;
 - for older kernels, there can be a brief moment when exec's affinity
   is different than either initial or final affinity being set;
 - exec's final CPU affinity, if not specified, can be different
   depending on the kernel, therefore we don't test it.

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jan 21, 2025
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jan 21, 2025
As per
- opencontainers/runtime-spec#1253
- opencontainers/runtime-spec#1261

CPU affinity can be set in two ways:
1. When creating/starting a container, in config.json's
   Process.ExecCPUAffinity, which is when applied to all execs.
2. When running an exec, in process.json's CPUAffinity, which
   applied to a given exec and overrides the value from (1).

Add some basic tests.

Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a
process to that of a container's cgroup, as soon as it is moved to that
cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that.

Because of the above,
 - it's impossible to really test initial CPU affinity without adding
   debug logging to libcontainer/nsenter;
 - for older kernels, there can be a brief moment when exec's affinity
   is different than either initial or final affinity being set;
 - exec's final CPU affinity, if not specified, can be different
   depending on the kernel, therefore we don't test it.

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin mentioned this pull request Feb 25, 2025
11 tasks
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Feb 28, 2025
As per
- opencontainers/runtime-spec#1253
- opencontainers/runtime-spec#1261

CPU affinity can be set in two ways:
1. When creating/starting a container, in config.json's
   Process.ExecCPUAffinity, which is when applied to all execs.
2. When running an exec, in process.json's CPUAffinity, which
   applied to a given exec and overrides the value from (1).

Add some basic tests.

Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a
process to that of a container's cgroup, as soon as it is moved to that
cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that.

Because of the above,
 - it's impossible to really test initial CPU affinity without adding
   debug logging to libcontainer/nsenter;
 - for older kernels, there can be a brief moment when exec's affinity
   is different than either initial or final affinity being set;
 - exec's final CPU affinity, if not specified, can be different
   depending on the kernel, therefore we don't test it.

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Mar 2, 2025
As per
- opencontainers/runtime-spec#1253
- opencontainers/runtime-spec#1261

CPU affinity can be set in two ways:
1. When creating/starting a container, in config.json's
   Process.ExecCPUAffinity, which is when applied to all execs.
2. When running an exec, in process.json's CPUAffinity, which
   applied to a given exec and overrides the value from (1).

Add some basic tests.

Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a
process to that of a container's cgroup, as soon as it is moved to that
cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that.

Because of the above,
 - it's impossible to really test initial CPU affinity without adding
   debug logging to libcontainer/nsenter;
 - for older kernels, there can be a brief moment when exec's affinity
   is different than either initial or final affinity being set;
 - exec's final CPU affinity, if not specified, can be different
   depending on the kernel, therefore we don't test it.

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Mar 2, 2025
As per
- opencontainers/runtime-spec#1253
- opencontainers/runtime-spec#1261

CPU affinity can be set in two ways:
1. When creating/starting a container, in config.json's
   Process.ExecCPUAffinity, which is when applied to all execs.
2. When running an exec, in process.json's CPUAffinity, which
   applied to a given exec and overrides the value from (1).

Add some basic tests.

Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a
process to that of a container's cgroup, as soon as it is moved to that
cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that.

Because of the above,
 - it's impossible to really test initial CPU affinity without adding
   debug logging to libcontainer/nsenter;
 - for older kernels, there can be a brief moment when exec's affinity
   is different than either initial or final affinity being set;
 - exec's final CPU affinity, if not specified, can be different
   depending on the kernel, therefore we don't test it.

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Mar 2, 2025
As per
- opencontainers/runtime-spec#1253
- opencontainers/runtime-spec#1261

CPU affinity can be set in two ways:
1. When creating/starting a container, in config.json's
   Process.ExecCPUAffinity, which is when applied to all execs.
2. When running an exec, in process.json's CPUAffinity, which
   applied to a given exec and overrides the value from (1).

Add some basic tests.

Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a
process to that of a container's cgroup, as soon as it is moved to that
cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that.

Because of the above,
 - it's impossible to really test initial CPU affinity without adding
   debug logging to libcontainer/nsenter;
 - for older kernels, there can be a brief moment when exec's affinity
   is different than either initial or final affinity being set;
 - exec's final CPU affinity, if not specified, can be different
   depending on the kernel, therefore we don't test it.

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Mar 2, 2025
As per
- opencontainers/runtime-spec#1253
- opencontainers/runtime-spec#1261

CPU affinity can be set in two ways:
1. When creating/starting a container, in config.json's
   Process.ExecCPUAffinity, which is when applied to all execs.
2. When running an exec, in process.json's CPUAffinity, which
   applied to a given exec and overrides the value from (1).

Add some basic tests.

Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a
process to that of a container's cgroup, as soon as it is moved to that
cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that.

Because of the above,
 - it's impossible to really test initial CPU affinity without adding
   debug logging to libcontainer/nsenter;
 - for older kernels, there can be a brief moment when exec's affinity
   is different than either initial or final affinity being set;
 - exec's final CPU affinity, if not specified, can be different
   depending on the kernel, therefore we don't test it.

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Mar 3, 2025
As per
- opencontainers/runtime-spec#1253
- opencontainers/runtime-spec#1261

CPU affinity can be set in two ways:
1. When creating/starting a container, in config.json's
   Process.ExecCPUAffinity, which is when applied to all execs.
2. When running an exec, in process.json's CPUAffinity, which
   applied to a given exec and overrides the value from (1).

Add some basic tests.

Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a
process to that of a container's cgroup, as soon as it is moved to that
cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that.

Because of the above,
 - it's impossible to really test initial CPU affinity without adding
   debug logging to libcontainer/nsenter;
 - for older kernels, there can be a brief moment when exec's affinity
   is different than either initial or final affinity being set;
 - exec's final CPU affinity, if not specified, can be different
   depending on the kernel, therefore we don't test it.

Signed-off-by: Kir Kolyshkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants