-
Notifications
You must be signed in to change notification settings - Fork 373
Name resolution failure when using Docker custom networks (swarm/compose) #175
Comments
From @jodh-intel on March 2, 2018 9:6 Hi @gvancuts - thanks for reporting. As explained in the issue template, it would be helpful if you could paste in the output of |
From @gvancuts on March 2, 2018 10:52 Hi @jodh-intel , here is the output from my Fedora 27 machine. Let me know if you also would like the output from my Clear Linux machine. If you have specific, additional debugging you'd like me to turn on, just shoot! Thanks! $ sudo cc-collect-data.sh Meta detailsRunning Runtime is
|
From @jodh-intel on March 2, 2018 15:29 ping @egernst, @mcastelino. |
ack, @jodh-intel -- taking a look at this today. |
From @mcastelino on March 2, 2018 17:47 @gvancuts we have an issue with DNS w.rt. swarm as described here. I suspect this is also a similar issue. We will root cause this failure. /cc @egernst @jodh-intel |
From @mcastelino on March 2, 2018 22:12 @gvancuts just confirming that this the same issue we have seen with docker swarm. Docker is running a resolver in the network namespace of the container and running all DNS traffic through it. And only if dockerd cannot handle the same, it is then send to the host. |
From @amshinde on March 2, 2018 22:25 @mcastelino I ran a bunch of tests and see the same behaviour, containers are on a separate network with an separate dns resolver(127.0.0.11) handling the name resolution for the network. |
From @mcastelino on March 2, 2018 23:2 @amshinde is this true for all custom networks with docker. That means that we have a broader problem with all docker advanced networks. If so we should priortize tc mirroring support in Kata. As we have a potential path to addressing this once we have tc support. /cc @egernst |
From @amshinde on March 2, 2018 23:7 @mcastelino This seems to be the case for all custom networks other than the default bridge network : https://docs.docker.com/v17.09/engine/userguide/networking/configure-dns/ |
From @sboeuf on March 2, 2018 23:20 @mcastelino I am curious to understand how the tc mirroring case can solve this issue. Is there an explanation somewhere ? |
From @mcastelino on March 2, 2018 23:33 Here are the details If we implement support in the agent to proxy the DNS requests coming in from the VM by listening to 127.0.0.11:53 and sending it to the shim running in the host name space, which can forward the request on to dockerd, internal DNS resolution will work. This is a bit of work, but technically can be done, and will need to built into our gRPC protocol. However the external DNS resolution will not work, details are below Internal DNS ResolutionInternal DNS resolution is handled completely by dockerd. So dockerd directly responds to the DNS request from the container process for any cluster local resource. External DNS ResolutionExternal DNS resolution is not handled by dockerd. When dockerd is unable to resolve the name to a cluster local resource it will then perform a DNS resolution using the host's resolv.conf. Hence the DNS resolution process for external name is
Here you will notice, dockerd sends packets out from within the namespace to the host via the interface bound to the docker_gwbridge. In the case of clear containers using macvtap/bridge as there is network connectivity between the container network namespace and the host, this request can never be fulfilled as there is no path out of the network namespace on the host to the host network. tc based external DNS resolutionIn the case of the tc based approach, the network interface exists and is fully active in the network namespace on the host side. So this external DNS resolution traffic can be sent down the interface on the host side. The response received will have to be send to dockerd (i.e. not mirrored back into VM). So with tc, we have a path to a potential solution even though it is complex. That said given that we have described the problem, I am also open to simpler solutions. |
From @sboeuf on March 2, 2018 23:47 @mcastelino I don't know if you know, but we now spawn our shims in the pod network namespace. This means we have a process that could be able to actually redirect external DNS requests, right ? |
From @mcastelino on March 2, 2018 23:51 @sboeuf yes I am aware of that and that is good, and makes this solution possible in the first place. That in itself will not solve the external or internal DNS issue. For the internal dns we need to add more logic to the shim. Once we add that the external DNS should just work, provided we setup the tc mirroring rules properly to allow external DNS responses to make their way back to dockerd. So in short you still need a path out/in to the host network via the docker setup veth interface. Which is where tc comes in. |
Since this is also observed in kata containers at this point, I think we should move this over there and open a new issue. I'd recommend marking swarm/compose as a "not fix" for Clear. This way it'll be a bit easier/more straightforward for asking for input from other Kata contributors... Agreed? |
/cc @bekars |
Same issue, without docker-compose :
expected output :
actual output :
|
@sboeuf that does not help as we do not have network connectivity back to the host from the network namespace. So only if we add support of tc will we have any chance of fixing this issue /cc @amshinde |
Here is a dirty workaround :
output of docker-compose up :
please note that this workaround doesn't cover every case since many features of docker-compose are not present in version 1 |
I don't want to rush you, but I like to know if you have any clue to fix this? |
@fredbcode I am planning to try out a solution this week based on proposal outlined in this comment to see if this works out: We already have some work done to implement tc mirroring in Kata. |
@amshinde I also hope to have some good news soon |
Hello there, |
Here is the link to the slides with the proposed solution I presented in the Kata architecture forum: @bergwolf @gnawux @WeiZhang555 @jon PTAL and add in comments if you have any feedback/questions. cc @egernst |
@fredbcode @Gabasjob @alebourdoulous We have got the ball rolling on this one. This does involve substantial changes that need to be implemented. Will let you know once we have this ready for testing. |
Thanks, please let me know if you need help for testing |
@amshinde Hello, there will be a testing version soon ? |
Hello, I am really sorry to insist on this, but right now Kata is unusable in most cases, using internal network is a very basic situation. It's very frustrating to have a such great project disabled on our systems :) |
Maybe the team focus on integration with k8s more than docker swarm now. PR is welcome |
Unfortunately this issue is more deeper than only a specific problem with swarm or compose |
@amshinde @mcastelino - we should probably have an update on any progress or plans here - or if we don't have resource to implement, have we documented how it could be done so somebody else might be able to pick it up? |
@grahamwhaley I probably would not be able to get to this for atleast couple of weeks.
I have included the proposal I came up in the link above. Anyone interested in picking this up can refer to the design proposal above |
Hello, |
Hello all, I was wondering how the users manage with this issue, I mean usually users are working only without internal docker network or there is a workaround ? |
/cc @amshinde @mcastelino |
A change to any of the gRPC protocol buffer files (`*.proto`) should require two additional approvals due to the potential impact it could have across the system. Fixes kata-containers#175. Signed-off-by: James O. D. Hunt <[email protected]>
+1 |
Has anyone found a different workaround than these 3 :
? |
have you tried with the netmon ? |
Yes, but still not able to reach Docker's embedded DNS |
@jodh-intel Is there a reason this got closed without a resolution? I've currently botched together something to handle this but it certainly doesn't replace the native functionality. There is at least one potential fix above |
@mcassaniti, please, see http://lists.katacontainers.io/pipermail/kata-dev/2021-April/001819.html, there you'll find the explanation why the issue was closed. |
|
Thank you very much for the feedback. |
From @gvancuts on March 1, 2018 20:55
Description of problem
Applications built using Docker Compose will fail if the services that compose it depend on name resolution to talk to each other.
Expected result
Docker Compose applications using name resolution would continue to work correctly.
Actual result
The services are not able to talk to each other. This can easily be reproduced by following stesp 1 to 4 of this Docker Compose Getting Started guide: https://docs.docker.com/compose/gettingstarted/. It works correctly if you use Docker with its default runtime (
runc
) but fails as soon as you switch to Clear Containers 3's runtime (cc-runtime
).Copied from original issue: clearcontainers/runtime#1042
The text was updated successfully, but these errors were encountered: