Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod startup latency with Calico and EKS #1629

Merged
merged 4 commits into from
Sep 29, 2021
Merged

Conversation

jayanthvn
Copy link
Contributor

@jayanthvn jayanthvn commented Sep 21, 2021

What type of PR is this?
Enhancement

Which issue does this PR fix:
Network connection latency on installing calico on EKS clusters with aws-vpc-cni plugin.

What does this PR do / Why do we need it:
Calico CNI plugin writes the IP address back to the pod as an annotation with Key : vpc.amazonaws.com/pod-ips Value : podIP to mitigate the delay with kubelet updating the Pod.Status.PodIP. This PR leverages the same annotation for aws-vpc-cni and it can be enabled with ANNOTATE_POD_IP knob on need basis. Ref: projectcalico/calico#3530 and upstream issue for kubelet delay - kubernetes/kubernetes#39113

ClusterRole needs to be updated to provide patch capabilities to aws-node for pods.

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:
Fixes #493

Testing done on this change:

Yes

Knob disabled -

Ping Failed 1 times
Ping Failed 2 times
Ping Failed 3 times
Ping Failed 4 times
Ping Failed 5 times
Ping Failed 6 times
Ping Failed 7 times
Ping Failed 8 times
PING 192.168.73.105 (192.168.73.105) 56(84) bytes of data.
64 bytes from 192.168.73.105: icmp_seq=1 ttl=255 time=0.073 ms

--- 192.168.73.105 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms

Knob enabled -

PING 192.168.73.105 (192.168.73.105) 56(84) bytes of data.
64 bytes from 192.168.73.105: icmp_seq=1 ttl=255 time=0.072 ms

--- 192.168.73.105 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms

Automation added to e2e:

No

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
No

Does this change require updates to the CNI daemonset config files to work?:

Yes.

To use this feature -

Add clusterRole with "patch" capabilities to pods.
Knob to enable - ANNOTATE_POD_IP

Does this PR introduce any user-facing change?:

no


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@srini-ram srini-ram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solution as it stands seems to be very specific to Calico (env var and label). Can we collaborate with Calico team to make this a generic label that can be consumed by calico operator ? This will allow extendability of this solution to other 3rd party network policy implementation that are supported with AWS VPC CNI ?

@jayanthvn
Copy link
Contributor Author

@sramabad1 - Yes we did talk to Calico team about the generic label which other providers can also leverage but that would also need agreement from other providers on the naming. Hence for short term solution we have added this knob, once we get the agreement we will deprecate this knob and use generic label. Please let me know your thoughts?

Copy link
Contributor

@srini-ram srini-ram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotation is removed in DelNetwork path only if Knob is enabled. Wondering if annotation clean up wouldnt kick in if the Pod gets deleted after knob is turned off. If you still want to keep the deletion logic as it is, an additional logic would be required to visit all pods on node in order to adjust the annotation when knob is turned off

@jayanthvn
Copy link
Contributor Author

Annotation is removed in DelNetwork path only if Knob is enabled. Wondering if annotation clean up wouldnt kick in if the Pod gets deleted after knob is turned off. If you still want to keep the deletion logic as it is, an additional logic would be required to visit all pods on node in order to adjust the annotation when knob is turned off

Agreed, because the policy expects the value to be empty. I was thinking if this can be documented? since the behavior is same for instance with custom networking.

@srini-ram
Copy link
Contributor

Annotation is removed in DelNetwork path only if Knob is enabled. Wondering if annotation clean up wouldn't kick in if the Pod gets deleted after knob is turned off. If you still want to keep the deletion logic as it is, an additional logic would be required to visit all pods on node in order to adjust the annotation when knob is turned off

Agreed, because the policy expects the value to be empty. I was thinking if this can be documented? since the behavior is same for instance with custom networking.

@sramabad1 - Yes we did talk to Calico team about the generic label which other providers can also leverage but that would also need agreement from other providers on the naming. Hence for short term solution we have added this knob, once we get the agreement we will deprecate this knob and use generic label. Please let me know your thoughts?

Annotation is mainly for Pod IP and label doesn't have to be related to network policy at all. As long as we could agree with Calico on a generic label name that is not vendor specific, it might be a reasonable path forward. We might not have to wait to converge on label name with all other providers and commit our code changes with generic label.

@srini-ram srini-ram self-requested a review September 21, 2021 20:39
Copy link
Contributor

@srini-ram srini-ram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@srini-ram srini-ram merged commit 357dfd6 into aws:master Sep 29, 2021
@jayanthvn jayanthvn added this to the v1.10 milestone Sep 29, 2021
jayanthvn added a commit to jayanthvn/amazon-vpc-cni-k8s that referenced this pull request Sep 29, 2021
* Calico optimization

* make format because of older commits

* Update the annotation

* update env variable
srini-ram pushed a commit that referenced this pull request Sep 29, 2021
* Calico optimization

* make format because of older commits

* Update the annotation

* update env variable
jayanthvn added a commit to jayanthvn/amazon-vpc-cni-k8s that referenced this pull request Oct 14, 2021
* Calico optimization

* make format because of older commits

* Update the annotation

* update env variable
jayanthvn added a commit that referenced this pull request Oct 14, 2021
* Calico optimization

* make format because of older commits

* Update the annotation

* update env variable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pod startup connectivity issue when using calico and vpc-cni
2 participants