-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CPU affinity to executed processes #1253
Conversation
8179d10
to
cb918d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@opencontainers/runtime-spec-maintainers PTAL (I'm going to merge this this week if there are no other comments) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
config.md
Outdated
on after the transition to container's cgroup. The format is either the | ||
same as for `initial`, or a special value `cgroup` which means container's | ||
CPU affinity, as defined by [cpu.cpus property](./config.md#configLinuxCPUs)). | ||
If omitted or empty, the `initial` setting is kept. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it be the opposite? I mean "if ommitted, cpu.cpus will be in effect"?
The initial
seems to allow to set affinity to cpu cores that are outside of the set which is specified by cpu.cpus
.
This also trigger the question of validation of the values between cpuAffinity
and cpu.cpus
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two questions here.
-
What is the default if
final
is not set. You are probably right thatcpu.cpus
makes more sense. UPDATED. -
Whether to allow affinity that's outside of container's cpu.cpus. While it seems theoretically possible for the
initial
cpuset, I haven't checked that it works, and I don't want to add any specific constraints to the spec (or to runc/crun).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL @kad
Also, s/in/on/g. Signed-off-by: Kir Kolyshkin <[email protected]>
@kad thanks, changed like this: diff --git a/config.md b/config.md
index eab6eec..b9b5573 100644
--- a/config.md
+++ b/config.md
@@ -340,9 +340,10 @@ For Linux-based systems, the `process` object supports the following process-spe
* **`class`** (string, REQUIRED) specifies the I/O scheduling class. Possible values are `IOPRIO_CLASS_RT`, `IOPRIO_CLASS_BE`, and `IOPRIO_CLASS_IDLE`.
* **`priority`** (int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest).
-* **`cpuAffinity`** (object, OPTIONAL) specifies CPU affinity used to execute the process.
+* **`execCPUAffinity`** (object, OPTIONAL) specifies CPU affinity used to execute the process.
+ This setting is not applicable to the container's init process.
The following properties are available:
- * **`initial`** (string, REQUIRED) is a list of CPUs a runtime parent
+ * **`initial`** (string, OPTIONAL) is a list of CPUs a runtime parent
process to be run on initially, before the transition to container's
cgroup. This is a a comma-separated list, with dashes to represent
ranges. For example, `0-3,7` represents CPUs 0,1,2,3, and 7.
@@ -427,7 +427,7 @@ _Note: symbolic name for uid and gid, such as uname and gname respectively, are
"soft": 1024
}
],
- "cpuAffinity": {
+ "execCPUAffinity": {
"initial": "7",
"final": "0-3,7"
}
diff --git a/schema/config-schema.json b/schema/config-schema.json
index 39cc71e..cb74342 100644
--- a/schema/config-schema.json
+++ b/schema/config-schema.json
@@ -221,11 +221,8 @@
}
}
},
- "cpuAffinity": {
+ "execCPUAffinity": {
"type": "object",
- "required": [
- "initial"
- ],
"properties": {
"initial": {
"type": "string",
diff --git a/specs-go/config.go b/specs-go/config.go
index 00bf946..290b6dc 100644
--- a/specs-go/config.go
+++ b/specs-go/config.go
@@ -94,8 +94,8 @@ type Process struct {
SelinuxLabel string `json:"selinuxLabel,omitempty" platform:"linux"`
// IOPriority contains the I/O priority settings for the cgroup.
IOPriority *LinuxIOPriority `json:"ioPriority,omitempty" platform:"linux"`
- // CPUAffinity specifies CPU affinity for the process.
- CPUAffinity *CPUAffinity `json:"cpuAffinity,omitempty" platform:"linux"`
+ // ExecCPUAffinity specifies CPU affinity for exec processes.
+ ExecCPUAffinity *CPUAffinity `json:"execCPUAffinity,omitempty" platform:"linux"`
}
// LinuxCapabilities specifies the list of allowed capabilities that are kept for a process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for update. looks better now. I'm still not convinced about names initial
and final
, maybe something more descriptive can be done like execRuntime
/ execContainer
for consistency with hooks?
But for that I'll let others to chime in.
* **`final`** (string, OPTIONAL) is a list of CPUs the process will be run | ||
on after the transition to container's cgroup. The format is the same as | ||
for `initial`. If omitted or empty, the container's default CPU affinity, | ||
as defined by [cpu.cpus property](./config.md#configLinuxCPUs)), is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the tests, if you don't call sched_setaffinity()
, the kernel itself will reset it to cgroups cpuset.cpu, so, maybe here statement can be simplified, that if not specified, after transition into container namespace, default affinity depends on kernel implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is nicer to have a reference to another way of setting cpu affinity here. If removed, it still says the same thing:
on after the transition to container's cgroup. The format is the same as
- for `initial`. If omitted or empty, the container's default CPU affinity,
- as defined by [cpu.cpus property](./config.md#configLinuxCPUs)), is used.
+ for `initial`. If omitted or empty, the container's default CPU affinity
+ applies.
so I think it's better to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, ok for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the tests, if you don't call sched_setaffinity(), the kernel itself will reset it to cgroups cpuset.cpu
@kad It seems your tests are wrong. Mine are showing this only happens when the initial (pre cgroup-move) affinity is not a subset of cgroup's affinity.
So maybe we need to change the spec to say something like
If omitted or empty, no attempt to change the CPU affinity of the process after moving it to container's cgroup is made, and its final affinity is determined by the Linux kernel.
I'd rather not go into more details here, because otherwise I'll be documenting the kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my tests, it is not the case. (-p is "pause" where I'm attaching in another terminal process to cgroup with cpuset.cpus=0-10):
r640-2:/home/akanevsk # ~akanevsk/getaffinity -g -s "0-5" -g -p -g
get_affinity: 0-87
set_affinity: affinity set to 0-5
get_affinity: 0-5
get_affinity: 0-10
r640-2:/home/akanevsk # ~akanevsk/getaffinity -g -s "50-60" -g -p -g
get_affinity: 0-87
set_affinity: affinity set to 50-60
get_affinity: 50-60
get_affinity: 0-10
This allows to set initial and final CPU affinity for a process being run in a container, which is needed to solve the issue described in [1]. [1] opencontainers/runc#3922 Signed-off-by: Kir Kolyshkin <[email protected]>
As per - opencontainers/runtime-spec#1253 - opencontainers/runtime-spec#1261 Add some tests (alas it's impossible to test initial CPU affinity without adding debug logging). Signed-off-by: Kir Kolyshkin <[email protected]>
As per - opencontainers/runtime-spec#1253 - opencontainers/runtime-spec#1261 Add some tests (alas it's impossible to test initial CPU affinity without adding debug logging). Signed-off-by: Kir Kolyshkin <[email protected]>
This is to include - opencontainers/runtime-spec#1261 - opencontainers/runtime-spec#1253 Signed-off-by: Kir Kolyshkin <[email protected]>
As per - opencontainers/runtime-spec#1253 - opencontainers/runtime-spec#1261 Add some tests (alas it's impossible to test initial CPU affinity without adding debug logging). Signed-off-by: Kir Kolyshkin <[email protected]>
As per - opencontainers/runtime-spec#1253 - opencontainers/runtime-spec#1261 Add some tests (alas it's impossible to test initial CPU affinity without adding debug logging). Signed-off-by: Kir Kolyshkin <[email protected]>
This is to include - opencontainers/runtime-spec#1261 - opencontainers/runtime-spec#1253 Signed-off-by: Kir Kolyshkin <[email protected]>
As per - opencontainers/runtime-spec#1253 - opencontainers/runtime-spec#1261 Add some tests (alas it's impossible to test initial CPU affinity without adding debug logging). Signed-off-by: Kir Kolyshkin <[email protected]>
As per - opencontainers/runtime-spec#1253 - opencontainers/runtime-spec#1261 Add some tests (alas it's impossible to test initial CPU affinity without adding debug logging to nsexec). Signed-off-by: Kir Kolyshkin <[email protected]>
As per - opencontainers/runtime-spec#1253 - opencontainers/runtime-spec#1261 CPU affinity can be set in two ways: 1. When creating/starting a container, in config.json's Process.ExecCPUAffinity, which is when applied to all execs. 2. When running an exec, in process.json's CPUAffinity, which applied to a given exec and overrides the value from (1). Add some basic tests. Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a process to that of a container's cgroup, as soon as it is moved to that cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that. Because of the above, - it's impossible to really test initial CPU affinity without adding debug logging to libcontainer/nsenter; - for older kernels, there can be a brief moment when exec's affinity is different than either initial or final affinity being set; - exec's final CPU affinity, if not specified, can be different depending on the kernel, therefore we don't test it. Signed-off-by: Kir Kolyshkin <[email protected]>
This is to include - opencontainers/runtime-spec#1261 - opencontainers/runtime-spec#1253 Signed-off-by: Kir Kolyshkin <[email protected]>
As per - opencontainers/runtime-spec#1253 - opencontainers/runtime-spec#1261 CPU affinity can be set in two ways: 1. When creating/starting a container, in config.json's Process.ExecCPUAffinity, which is when applied to all execs. 2. When running an exec, in process.json's CPUAffinity, which applied to a given exec and overrides the value from (1). Add some basic tests. Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a process to that of a container's cgroup, as soon as it is moved to that cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that. Because of the above, - it's impossible to really test initial CPU affinity without adding debug logging to libcontainer/nsenter; - for older kernels, there can be a brief moment when exec's affinity is different than either initial or final affinity being set; - exec's final CPU affinity, if not specified, can be different depending on the kernel, therefore we don't test it. Signed-off-by: Kir Kolyshkin <[email protected]>
As per - opencontainers/runtime-spec#1253 - opencontainers/runtime-spec#1261 CPU affinity can be set in two ways: 1. When creating/starting a container, in config.json's Process.ExecCPUAffinity, which is when applied to all execs. 2. When running an exec, in process.json's CPUAffinity, which applied to a given exec and overrides the value from (1). Add some basic tests. Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a process to that of a container's cgroup, as soon as it is moved to that cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that. Because of the above, - it's impossible to really test initial CPU affinity without adding debug logging to libcontainer/nsenter; - for older kernels, there can be a brief moment when exec's affinity is different than either initial or final affinity being set; - exec's final CPU affinity, if not specified, can be different depending on the kernel, therefore we don't test it. Signed-off-by: Kir Kolyshkin <[email protected]>
As per - opencontainers/runtime-spec#1253 - opencontainers/runtime-spec#1261 CPU affinity can be set in two ways: 1. When creating/starting a container, in config.json's Process.ExecCPUAffinity, which is when applied to all execs. 2. When running an exec, in process.json's CPUAffinity, which applied to a given exec and overrides the value from (1). Add some basic tests. Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a process to that of a container's cgroup, as soon as it is moved to that cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that. Because of the above, - it's impossible to really test initial CPU affinity without adding debug logging to libcontainer/nsenter; - for older kernels, there can be a brief moment when exec's affinity is different than either initial or final affinity being set; - exec's final CPU affinity, if not specified, can be different depending on the kernel, therefore we don't test it. Signed-off-by: Kir Kolyshkin <[email protected]>
As per - opencontainers/runtime-spec#1253 - opencontainers/runtime-spec#1261 CPU affinity can be set in two ways: 1. When creating/starting a container, in config.json's Process.ExecCPUAffinity, which is when applied to all execs. 2. When running an exec, in process.json's CPUAffinity, which applied to a given exec and overrides the value from (1). Add some basic tests. Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a process to that of a container's cgroup, as soon as it is moved to that cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that. Because of the above, - it's impossible to really test initial CPU affinity without adding debug logging to libcontainer/nsenter; - for older kernels, there can be a brief moment when exec's affinity is different than either initial or final affinity being set; - exec's final CPU affinity, if not specified, can be different depending on the kernel, therefore we don't test it. Signed-off-by: Kir Kolyshkin <[email protected]>
As per - opencontainers/runtime-spec#1253 - opencontainers/runtime-spec#1261 CPU affinity can be set in two ways: 1. When creating/starting a container, in config.json's Process.ExecCPUAffinity, which is when applied to all execs. 2. When running an exec, in process.json's CPUAffinity, which applied to a given exec and overrides the value from (1). Add some basic tests. Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a process to that of a container's cgroup, as soon as it is moved to that cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that. Because of the above, - it's impossible to really test initial CPU affinity without adding debug logging to libcontainer/nsenter; - for older kernels, there can be a brief moment when exec's affinity is different than either initial or final affinity being set; - exec's final CPU affinity, if not specified, can be different depending on the kernel, therefore we don't test it. Signed-off-by: Kir Kolyshkin <[email protected]>
As per - opencontainers/runtime-spec#1253 - opencontainers/runtime-spec#1261 CPU affinity can be set in two ways: 1. When creating/starting a container, in config.json's Process.ExecCPUAffinity, which is when applied to all execs. 2. When running an exec, in process.json's CPUAffinity, which applied to a given exec and overrides the value from (1). Add some basic tests. Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a process to that of a container's cgroup, as soon as it is moved to that cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that. Because of the above, - it's impossible to really test initial CPU affinity without adding debug logging to libcontainer/nsenter; - for older kernels, there can be a brief moment when exec's affinity is different than either initial or final affinity being set; - exec's final CPU affinity, if not specified, can be different depending on the kernel, therefore we don't test it. Signed-off-by: Kir Kolyshkin <[email protected]>
As per - opencontainers/runtime-spec#1253 - opencontainers/runtime-spec#1261 CPU affinity can be set in two ways: 1. When creating/starting a container, in config.json's Process.ExecCPUAffinity, which is when applied to all execs. 2. When running an exec, in process.json's CPUAffinity, which applied to a given exec and overrides the value from (1). Add some basic tests. Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a process to that of a container's cgroup, as soon as it is moved to that cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that. Because of the above, - it's impossible to really test initial CPU affinity without adding debug logging to libcontainer/nsenter; - for older kernels, there can be a brief moment when exec's affinity is different than either initial or final affinity being set; - exec's final CPU affinity, if not specified, can be different depending on the kernel, therefore we don't test it. Signed-off-by: Kir Kolyshkin <[email protected]>
This allows to set initial and final CPU affinity for a process being run in a container, which is needed to solve the issue described in opencontainers/runc#3922.
This is an alternative to #1252.