-
Notifications
You must be signed in to change notification settings - Fork 349
--privileged should add explicit "rw" config for /sys mount #753
Comments
@corrieb Thanks for the bug report. I'll look into it. |
It seems fine in my cluster: # crictl ps | grep kube-proxy
1b5e8f1282b81 sha256:bfc21aadc7d3e20e34cec769d697f93543938e9151c653591861ec5f2429676b 23 hours ago CONTAINER_RUNNING kube-proxy
# crictl exec -s 1b5e8f1282b81 /bin/sh -c "cat /proc/mounts | grep sysfs"
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
# crictl inspect 1b5e8f1282b81
...
{
"destination": "/sys",
"type": "sysfs",
"source": "sysfs",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
... And actually Docker is doing similar thing IIUC. https://github.com/moby/moby/blob/master/daemon/oci_linux.go#L679
Can you share your pod yaml? |
Thanks for helping me investigate @Random-Liu. I'm not using crictl, I'm creating a PodSandboxConfig and ContainerConfig, based on the v1.Pod specification for kube-proxy. I'm trying to figure out why I'm seeing this inconsistency. It could be something I've missed, but I'm struggling to see what it might be. Here are my configs:
|
This is a dump of a kube-proxy Pod Spec (not the exact same pod as you'll see from the ID)
|
As you can see, the Pod spec states that it should be run Privileged, but it doesn't state anything explicit about /sys and how it should be mounted. I've been assuming that setting If I run
This therefore seems to boil down to a question of what the "default" /sys mount is expected to be if neither "ro" or "rw" is specified? It appears to be being interpreted as "ro". |
I see. What is the runc version you are using? And what is your OS? Does docker work in your environment in this case? We do the same thing with docker. |
runc version 1.0.0-rc5 - containerd 1.1.0. OS is Debian Linux 4.9.0-6-amd64 It's interesting to note that the CRI spec is very clear about what Privileged means: https://github.com/kubernetes/kubernetes/blob/a38a02792b55942177ee676a5e1993b18a8b4b0a/pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto#L541
What's not clear though is whether runc should interpret Privileged in this way or whether that should be made explicit by the caller. That is indeed the case with ctr - it explicitly adds "rw" for Privileged before runc gets to see the spec. If that's the answer, then that's the answer. The client needs to make the Privileged mounts explicit. If so, I'll modify my code to do that. |
What's curious to me though @Random-Liu is that your mount spec looks just the same as mine as reported by inspect, but you have a different outcome. That potentially suggests an OS configuration difference. |
@Random-Liu did you ever figure out this: moby/moby#24000? |
@corrieb No. We never able to figure that out. Can you update your code to explicitly set I'd like to spend a little more time to figure out why there is such difference. It could be some kernel version difference, or something? Let me do a search. But anyway, I do think we need your patch if that fixes the problem for you. :) Thanks a lot for looking into this! This might fix a problem which is there for a long time in both docker and |
I'll do some research. If we find a default behavior difference caused by OS version or configure difference, we should fix docker as well. |
I think it's pretty much guaranteed to fix it. When I run |
What this seems to boil down to is what happens if neither "ro" or "rw" are specified. cri-containerd makes sure to remove "ro", but doesn't add "rw". That ultimately makes it ambiguous |
@corrieb Yeah, i mean cri-containerd does that because docker is doing so. But it seems that this may not work in some OS config/version. Would you like to send a PR to add |
It would be super helpful if you can verify the change, because I can't reproduce it. :) You can simply |
Absolutely. I'll need to go through our legal dept to get myself approved to submit code to containerd, but heck - that's a worthwhile think to do regardless. @justincormack curious if you have an opinion on this. |
@corrieb Will wait for your fix. If possible, I'd still like to root cause this:
@justincormack Do you have any idea? |
|
@corrieb One last thing... If possible, can you exec into the kube-proxy pod to check whether cgroup is |
@corrieb Thanks for helping me verify. The information is very useful for future debugging. This seems specific to Let's explicitly set Thanks a lot for looking into this! |
@corrieb Do you have any problem? Probably I can send a fix for you? :) |
@Random-Liu Thanks for doing this. I was at KubeCon last week, this was by far the fastest way to get it done. |
@corrieb No problem. Did it for you. :) Thanks a lot for finding and reporting the bug! |
If you create a privileged container using the ctr client, it will explicitly mount /sys using the "rw" flag. See https://github.com/containerd/containerd/blob/9d9d1bc13c107a460212d12ed7ee2f422379a10f/oci/spec_opts_unix.go#L602
cri-containerd simply removes the "ro" flag for --privileged, assuming the container runtime will default to a RW mount, which appears to not be the case. See
cri/pkg/server/container_create.go
Line 666 in ed92bef
As a result, running kube-proxy using containerd 1.1 via cri-containerd results in the following error:
E0426 18:40:33.281905 5 conntrack.go:124] sysfs is not writable: {Device:sysfs Path:/sys Type:sysfs Opts:[ro nosuid nodev noexec relatime] Freq:0 Pass:0} (mount options are [ro nosuid nodev noexec relatime])
The text was updated successfully, but these errors were encountered: