-
Notifications
You must be signed in to change notification settings - Fork 759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod startup latency with Calico and EKS #1629
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solution as it stands seems to be very specific to Calico (env var and label). Can we collaborate with Calico team to make this a generic label that can be consumed by calico operator ? This will allow extendability of this solution to other 3rd party network policy implementation that are supported with AWS VPC CNI ?
@sramabad1 - Yes we did talk to Calico team about the generic label which other providers can also leverage but that would also need agreement from other providers on the naming. Hence for short term solution we have added this knob, once we get the agreement we will deprecate this knob and use generic label. Please let me know your thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annotation is removed in DelNetwork path only if Knob is enabled. Wondering if annotation clean up wouldnt kick in if the Pod gets deleted after knob is turned off. If you still want to keep the deletion logic as it is, an additional logic would be required to visit all pods on node in order to adjust the annotation when knob is turned off
Agreed, because the policy expects the value to be empty. I was thinking if this can be documented? since the behavior is same for instance with custom networking. |
Annotation is mainly for Pod IP and label doesn't have to be related to network policy at all. As long as we could agree with Calico on a generic label name that is not vendor specific, it might be a reasonable path forward. We might not have to wait to converge on label name with all other providers and commit our code changes with generic label. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Calico optimization * make format because of older commits * Update the annotation * update env variable
* Calico optimization * make format because of older commits * Update the annotation * update env variable
What type of PR is this?
Enhancement
Which issue does this PR fix:
Network connection latency on installing calico on EKS clusters with aws-vpc-cni plugin.
What does this PR do / Why do we need it:
Calico CNI plugin writes the IP address back to the pod as an annotation with
Key : vpc.amazonaws.com/pod-ips
Value : podIP
to mitigate the delay with kubelet updating thePod.Status.PodIP
. This PR leverages the same annotation for aws-vpc-cni and it can be enabled with ANNOTATE_POD_IP knob on need basis. Ref: projectcalico/calico#3530 and upstream issue for kubelet delay - kubernetes/kubernetes#39113ClusterRole needs to be updated to provide patch capabilities to aws-node for pods.
If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:
Fixes #493
Testing done on this change:
Yes
Knob disabled -
Knob enabled -
Automation added to e2e:
No
Will this break upgrades or downgrades. Has updating a running cluster been tested?:
No
Does this change require updates to the CNI daemonset config files to work?:
Yes.
To use this feature -
Add clusterRole with "patch" capabilities to pods.
Knob to enable - ANNOTATE_POD_IP
Does this PR introduce any user-facing change?:
no
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.