-
Notifications
You must be signed in to change notification settings - Fork 373
Do not pass all host devices in /dev when launching with '--privileged' #1568
Comments
Please see the related issue and root cause here: |
any idea about this issue? |
Hi @zhiminghufighting - I've updated the description and reformatted the details to make this easier to understand. Please can you provide details of which devices you would not expect to see? |
@zhiminghufighting I am not sure why you are seeing this behaviour, before I go take a look, can you tell me what is the specific android image that you are using? |
This is highly unexpected and undocumented behavior that just fell on my feet during a CTF competition that uses Kata Containers for isolation :) I passed Applied a local hotfix (no-op |
@leoluk - saw your PR over at kata-containers/documentation#452, thx. |
See kata-containers/runtime#1568 Fixes kata-containers#453 Signed-off-by: Leopold Schabel <[email protected]>
See kata-containers/runtime#1568 Fixes kata-containers#453 Signed-off-by: Leopold Schabel <[email protected]>
See kata-containers/runtime#1568 Fixes kata-containers#453 Signed-off-by: Leopold Schabel <[email protected]>
@amshinde I use a android based container which is running in Tencent cloud gaming.(This android based image also is made by intel AIS team for Tencent cloud gaming) @jodh-intel Thanks for your comment and corrections. |
@zhiminghufighting Sorry for the delay in getting back. Finally, are you running this under docker or k8s? Could you provide the config.json file that gets passed to the runtime. For docker this should be under If you could point me to the android image that I should use, I can try and reproduce this as well. |
the number os devices are different in the host, over kata and over runc
|
I didn't get, what makes you to think that devices are shared in --privileged mode? which is the evidence? |
Can we make this configurable, at least (and default it to off)? There's plenty of use cases that need privileged access inside the container only. |
Hi @leoluk, can you share a bit more about how you do filterDevices as the hotfix? |
+1 for this issue. Using Kata to isolate and secure Docker in docker is pointless when all of the host devices are simply hot plugged into the VM, including the host's I would like to see a configurable option in the Kata config to control filtering of the Another possible option that @egernst suggested is to add support for an annotation, that when used would enable a more kata specific privileged mode - support for all capabilities, writable sysfs, but no host devices. A cut down version of Thoughts? |
-1.
If you just need the second one, you can list the required capabilities you need with OTOH, if users require a simple way to specify that |
No, it translates into a bit more than that, see what is provided in containerd - https://github.com/containerd/containerd/blob/172fe90e55c3c3a452f9bec926d87ffda5ed01bf/oci/spec_opts.go#L1090-L1101 |
@awprice Still, please file issues in moby/containerd to request for such a flag that does not send all Generally At very least, I'm fine to use an annotation as @egernst suggested to tell that users actually want to ignore all devices. At least we know that users do want it and it is not a wild guess on the real use case. But still I think moby/containerd is the right place for the fix. |
Kata is already interpreting the container spec quite liberally due to the extra abstraction layer involved. For instance, capabilities will not allow you to access the host system, but only the guest kernel in the VM. Why make an exception for devices? |
This feature can be implemented as a new flag for e.g. $ cat /etc/docker/daemon.json
{
"runtimes": {
"kata": {
"path": "/opt/kata/bin/kata-runtime"
},
"kata-isolate-dev": {
"path": "/opt/kata/bin/kata-runtime",
"runtimeArgs": [
"--kata-device-mode=isolate"
]
}
}
$ docker run --privileged --runtime=kata-isolate-dev docker:dind ... |
@leoluk Kata does translate container spec to apply a different secure context. We handle the container spec as much as we can, and there are pieces we have to ignore because there is no way to handle them in a virtual machine context. Here you want devices that are translated now to be ignored. That is a huge difference.
All you guys want is to run docker in docker with kata, even in a half broken way. But I think it is better to provide a proper
This seems to match containerd's translation in a virtual machine context: https://github.com/containerd/containerd/blob/172fe90e55c3c3a452f9bec926d87ffda5ed01bf/oci/spec_opts.go#L1090-L1101 Then you can do: $ cat /etc/docker/daemon.json
{
"runtimes": {
"kata": {
"path": "/opt/kata/bin/kata-runtime"
},
"kata-privileged": {
"path": "/opt/kata/bin/kata-runtime",
"runtimeArgs": [
"--privileged"
]
}
}
$ docker run --runtime=kata-privileged docker:dind ... And Then we can officially recommend |
I'm not against an annotation, we already abuse them for other use cases. |
Sigh, that is too bad, not having container annotations. Then the containerd/cri change (containerd/cri#1213) seems more appealing than before. wdyt about it? |
@bergwolf Sounds like a good approach to make the change in containerd/cri, as that's where the host devices are being listed and appended to the spec. Only other concern - what about cri-o? |
I'm not familiar with cri-o but I think it can make the same change. Right now the translation for privileged container is runc specific. It makes sense to have a different translation as runc is not the only runtime containerd/cri and cri-o support. |
@dadux Does your usecase really need the dind container in the same pod? |
No because the dind in it's own VM still requires privileged, and in it's current state will have the host devices mounted in. If there was a vulnerability in dind or the kernel in the dind pod, then access to the host devices may occur. It's also unnecessarily complicated. |
@awprice My question was about the future status with some alternative to the current In other words - the alternative solution really needs to be container-level rather than pod-level? |
We have a hard multi-tenancy requirements. pods (cicd builds) are isolated by namespace and have strict NetworkPolicies enforced, and when in Kata by different kernels. |
@AkihiroSuda No, the solution would be pod level, as thats the lowest level of object that you can place an annotation on. The logic in containerd/cri I imagine would work like this - If any of the pods have privileged AND there is a special annotation present on the pod to disable host devices, then don't append the host devices in the container spec. |
Can we extend the CRI spec to add an option for system fs |
It looks like it's already been discussed, to redefine But nothing has happened. 😢 |
Hi, @dadux , is it truly unacceptable that all containers in a pod are run as privileged? |
@smarton6626 - yes. In our case we're talking untrusted containers, with arbitrary code. If the untrusted container is privileged, it has capabilities to mount/attach devices, and access other containers in the pod's secret for instance. |
@dadux - Is it avoidable? I mean, how about treating the whole pod as untrusted and not using secret in it? |
The secrets was just an illustration, we treat the whole pod as insecure. But we also don't grant extra permissions to containers that don't need it. We drop all CAPS for those containers. "privileged" mode just makes it a ton easier to break out of the container. Why would you want to allow that when you don't have to ? There's good progress on the containerd/cri issue, with proposed solutions. |
@dadux - Because I won't care they break out the container if I treat the whole pod as insecure. They are all in a VM, can do nothing to my host. Think about the VM instances in a public cloud (eg. EC2). |
@smarton6626 - But we do care, having an extra kernel isolation is awesome but not bulletproof. Anyway, I don't think we're adding much value to initial problem here. |
@dadux - Well, thanks for your input which provides me different perspectives. I'll think over that, although for now still believe VM is secure enough for isolating untrusted workloads. |
Proposal for porting over this to Moby/Docker: moby/moby#39697 PR: moby/moby#39702 |
Is this solved? |
It's implemented in Kata and containerd, but has not yet propagated to Docker. |
@haslersn, please, see http://lists.katacontainers.io/pipermail/kata-dev/2021-April/001819.html, there you'll find the explanation why the issue was closed.
|
This one was rightfully closed, the feature in Kata is implemented and works! :-) |
Description of problem
When i launch an android image with
--privileged
parameter, all device under host/dev
directory will be passed into the container directory/dev
, the behavior doesn't make sense;Expected result
Don't pass all host devices into kata container even with
--privileged
parameter;Actual result
Here is the
ls
info under kata container directory/dev
:Here is the host view of
docker ps
:The text was updated successfully, but these errors were encountered: