Handle helm upgrade of AWS VPC CNI more gracefully #57

realvz · 2019-12-20T18:04:55Z

According the documentation in aws-VPC-CNI helm chart,
“ If you receive an error similar to Error: release aws-vpc-cni failed: "aws-node" already exists, simply rerun the above command.”

This causes issues with automation. This should be handled automatically in the helm chart and not require a rerun of the command by user.

max-rocket-internet · 2020-01-02T09:20:40Z

This should be handled automatically in the helm chart

For sure but how? To my knowledge, it's not possible blindly overwrite existing resources without errors. But the behaviour has also changed in the last couple releases of Helm.

3oris · 2020-02-18T07:38:59Z

For a freshliy set-up EKS cluster the a above quoted statement:

If you receive an error similar to Error: release aws-vpc-cni failed: "aws-node" already exists, simply rerun the above command.

is just not true:

$ helm version
version.BuildInfo{Version:"v3.0.2", GitCommit:"19e47ee3283ae98139d98460de796c1be1e3975f", GitTreeState:"clean", GoVersion:"go1.13.5"}

$ helm upgrade --install --recreate-pods --force aws-vpc-cni --namespace kube-system eks/aws-vpc-cni
Flag --recreate-pods has been deprecated, functionality will no longer be updated. Consult the documentation for other methods to recreate pods
Release "aws-vpc-cni" does not exist. Installing it now.
Error: rendered manifests contain a resource that already exists. Unable to continue with install: existing resource conflict: kind: ServiceAccount, namespace: kube-system, name: aws-node

$ helm upgrade --install --recreate-pods --force aws-vpc-cni --namespace kube-system eks/aws-vpc-cni
Flag --recreate-pods has been deprecated, functionality will no longer be updated. Consult the documentation for other methods to recreate pods
Release "aws-vpc-cni" does not exist. Installing it now.
Error: rendered manifests contain a resource that already exists. Unable to continue with install: existing resource conflict: kind: ServiceAccount, namespace: kube-system, name: aws-node

$ helm upgrade --install --recreate-pods --force aws-vpc-cni --namespace kube-system eks/aws-vpc-cni
Flag --recreate-pods has been deprecated, functionality will no longer be updated. Consult the documentation for other methods to recreate pods
Release "aws-vpc-cni" does not exist. Installing it now.
Error: rendered manifests contain a resource that already exists. Unable to continue with install: existing resource conflict: kind: ServiceAccount, namespace: kube-system, name: aws-node

The chart simply cannot be installed into an existing EKS cluster.

jaypipes · 2020-02-18T12:52:57Z

Thanks for the heads up @3oris, this is embarrassing indeed. I will work on getting an upgrade job placed into the VPC CNI system once we move the chart sources in there.

mohsen0 · 2020-02-18T13:29:43Z

I removed the daemon set object & service account before applying the change and managing CNI with helm.
after that everything went smoothly

3oris · 2020-02-18T17:33:00Z

@mohsen0 for me it doesn't. Helm complains about the pre-existing CRD next. I surely know that I could delete all related resources and then apply the chart. For me the question more or less is: Should an EKS cluster come with a pre-installed CNI plugin if after all the CNI plugin wont be part of the "managed service" promise. Now the thing comes with an unmanaged component out of the box that leaves me manage it myself, but adequate means of automation (in terms of CD) cannot be applied, and I have to work around that.

stefanprodan · 2020-02-18T17:50:15Z

The CNI chart is not compatible with Helm v3, we should move the CRD to the crds dir, here is an example: https://github.com/aws/eks-charts/tree/master/stable/appmesh-controller

mohsen0 · 2020-02-18T18:07:09Z

@3oris I understand there is room for more improvement and clarify, But I suggest a more constructive, positive approach. then your contributions will be much more appreciated.

max-rocket-internet · 2020-02-20T09:48:09Z

CRD issue aside, you can just install the chart, it fails, then purge it, then reinstall and all is done. Obviously deletes all the resources and then recreates them with the chart. I've done this method many times now and it works fine without any interruption.

max-rocket-internet · 2020-02-20T09:49:25Z

According the documentation in aws-VPC-CNI helm chart,
“ If you receive an error similar to Error: release aws-vpc-cni failed: "aws-node" already exists, simply rerun the above command.”

This was true for older versions of Helm. For 2.16 it doesn't work. Helm 3 also probably not.

3oris · 2020-02-20T11:55:32Z

Here is how we work around this in terraform for now: We remove all aws-node related pre-deployed resources as soon as the control plane becomes available (and before the worker nodes become ready) and then apply the helm chart.

data "aws_eks_cluster" "eks_cluster" {
  name = module.eks-cluster.cluster_id
}

data "aws_eks_cluster_auth" "eks_cluster_auth" {
  name = module.eks-cluster.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.eks_cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks_cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.eks_cluster_auth.token
  load_config_file       = false
  # bug: https://github.com/terraform-providers/terraform-provider-kubernetes/issues/759
  version = "1.10.0"
}

provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.eks_cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks_cluster.certificate_authority.0.data)
    token                  = data.aws_eks_cluster_auth.eks_cluster_auth.token
    load_config_file       = false
  }
}

module "eks-cluster" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 8.1"

  cluster_name = local.cluster_name
  subnets      = module.vpc.private_subnets
  vpc_id       = module.vpc.vpc_id

  workers_role_name = format("%s-nodes-role", local.cluster_name)

  enable_irsa       = true
  manage_aws_auth   = true

  node_groups = [
    {
      name = "managed-nodes"
    }
  ]

  node_groups_defaults = {
    ...
  }

  map_roles = [
    ...
  ]
  tags = local.tags
}

# employing the data resource wraper above,
# creation of the resources below will be triggered
# as soon as the control plane becomes ready

resource "null_resource" "remove_aws_vpc_cni_plugin" {
  provisioner "local-exec" {
    environment = {
      CLUSTER_ENDPOINT = data.aws_eks_cluster.eks_cluster.endpoint
      CLUSTER_CA       = data.aws_eks_cluster.eks_cluster.certificate_authority.0.data
      CLUSTER_TOKEN    = data.aws_eks_cluster_auth.eks_cluster_auth.token
    }
    command = format("%s/remove-aws-vpc-cni-plugin.sh", path.module)
  }
}

data "helm_repository" "eks" {
  name = "eks"
  url  = "https://aws.github.io/eks-charts"
}

resource "helm_release" "aws_vpc_cni" {
  depends_on = [null_resource.remove_aws_vpc_cni_plugin]
  repository = data.helm_repository.eks.metadata[0].name

  name      = "aws-vpc-cni"
  chart     = "aws-vpc-cni"
  version   = "1.0.2"
  namespace = "kube-system"
}

with remove-aws-vpc-cni-plugin.sh

#!/bin/bash

#CLUSTER_ENDPOINT
#CLUSTER_CA
#CLUSTER_TOKEN
#REGION

until curl -k -s $CLUSTER_ENDPOINT/healthz >/dev/null; do sleep 4; done

MANIFEST_FILE=`mktemp -p /tmp`
CONFIG_FILE=`mktemp -p /tmp`
CA_FILE=`mktemp -p /tmp`

trap "{ rm -f $MANIFEST_FILE $CONFIG_FILE $CA_FILE; }" EXIT

echo $CLUSTER_CA | base64 -d > $CA_FILE

VERSION=$(kubectl get ds aws-node -n kube-system -o yaml \
  --kubeconfig $CONFIG_FILE \
  --server $CLUSTER_ENDPOINT \
  --certificate-authority $CA_FILE \
  --token "$CLUSTER_TOKEN" \
  | grep "image:" | cut -d "/" -f 2 | cut -d ":" -f 2 | sed -E 's#^v([[:digit:]]+)\.([[:digit:]]+)\..+$#\1.\2#g')

curl -sqL \
  https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/release-$VERSION/config/v$VERSION/aws-k8s-cni.yaml \
  | sed -e "s#us-west-2#$REGION#" > $MANIFEST_FILE

kubectl delete \
  --kubeconfig $CONFIG_FILE \
  --server $CLUSTER_ENDPOINT \
  --certificate-authority $CA_FILE \
  --token "$CLUSTER_TOKEN" \
  --filename $MANIFEST_FILE \
  --wait

This can also be applied to an existing cluster. As @max-rocket-internet stated there should be no noticeable interruption.

Now you could annotate the aws-node service account via the helm_release resource and enjoy IAM Roles for Service Accounts.

mbarrien · 2020-05-14T05:53:06Z

Note that as of Helm 3.2.0, there is theoretically a way to adopt existing resources into helm if you add appropriate annotations and labels to it. helm/helm#7649

Just add the following annotations and labels to existing objects:

annotations:
  meta.helm.sh/release-name: your-release-name
  meta.helm.sh/release-namespace: your-release-namespace
label:
  app.kubernetes.io/managed-by: Helm

In theory, you can add the annotations and labels via kubectl annotate/kubectl label to the objects that AWS pre-creates, then run Helm. I haven't tested this yet, and this is still ugly if you're using something like Terraform, but in theory you don't have delete the Kubernetes objects now.

willejs · 2020-07-02T10:32:15Z

The above suggestion does not work as matchlabels on the daemonset is an immutable field. I am going to attempt to allow us to override that in the chart and make a pr, which may be feasible if it does not try to modify any other immutable fields.
I am using helmfile and was planning on adding the annotations and labels as a 'prepare' event on the release. I would rather not delete the whole existing 'release' as it could cause pods to fail sandbox create at our scale. If this doesn't work, the only alternative is a blue green deploy on a new node group.

TBBle · 2020-07-13T10:43:14Z

Now that #192 has been merged with documentation on how to adopt the EKS-deployed resources into the Helm chart, is there anything more to do here?

Skaronator · 2020-10-27T07:11:53Z

What happens when you just delete the existing resources and install the CNI helm chart a few seconds later? Does everything recover or do you break your cluster and existing nodes?

I did that yesterday and everything seems to continue working just fine until I've replaced alle nodes in my cluster. The AWS NLB stopped working and I only got a "Error empty reply from Server" until I deleted the LB (delete the ingress-nginx helm chart) and created a new one.

Update: Did the exact same thing on another cluster and everything went well. I did the exact same thing, only difference was that I was a bit faster doing everything. Weird. Maybe it was just a AWS EKS bug since I did both changes with a EKS Upgrade.

v-yarotsky · 2021-02-26T00:34:29Z

Would it be an option to use pre-install helm hook to add labels and annotations to existing resources? Seems like that would make it much easier to install the chart onto existing clusters.

TBBle · 2021-02-26T08:06:30Z

I don't think so, I don't think Helm will run the pre-install hooks if it's not going to do the install because there are conflicting resources it cannot adopt.

jdn5126 · 2022-11-22T22:57:27Z

VPC-CNI is fixing crds issue for helm v3 in aws/amazon-vpc-cni-k8s#2144

yama6a · 2022-12-04T13:44:54Z

If someone runs into problems with @3oris's great workaround (see above), you can try replacing the shell script with this one. Some things have changed since he posted his solution, so I applied some fixes. This works for me now, thanks a lot @3oris for your solution!

Fixes:

make mktemp OSX compatible (-p flag)
fix output filter to determine vcni-addon version ($VERSION)
fix location of aws-k8s-cni.yaml which has changed since v1.10

#!/bin/bash

#CLUSTER_ENDPOINT
#CLUSTER_CA
#CLUSTER_TOKEN
#REGION

until curl -k -s $CLUSTER_ENDPOINT/healthz >/dev/null; do sleep 4; done

MANIFEST_FILE=`mktemp -t manifest_`
CONFIG_FILE=`mktemp -t config_`
CA_FILE=`mktemp -t ca_`

trap "{ rm -f $MANIFEST_FILE $CONFIG_FILE $CA_FILE; }" EXIT

echo $CLUSTER_CA | base64 -d > $CA_FILE

VERSION=$(kubectl get ds aws-node -n kube-system -o yaml \
  --kubeconfig $CONFIG_FILE \
  --server $CLUSTER_ENDPOINT \
  --certificate-authority $CA_FILE \
  --token "$CLUSTER_TOKEN" \
  | grep -i "image: \d" | grep amazon-k8s-cni: | cut -d "/" -f 2 | cut -d ":" -f 2 | sed -E 's#^v([[:digit:]]+)\.([[:digit:]]+)\..+$#\1.\2#g')

curl -sqL \
  https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/release-$VERSION/config/master/aws-k8s-cni.yaml \
  | sed -e "s#us-west-2#$REGION#" > $MANIFEST_FILE

kubectl delete \
  --kubeconfig $CONFIG_FILE \
  --server $CLUSTER_ENDPOINT \
  --certificate-authority $CA_FILE \
  --token "$CLUSTER_TOKEN" \
  --filename $MANIFEST_FILE \
  --wait

seems like TF doesn't like helm_datasource anymore for this use, so I replaced it like this

resource "null_resource" "remove_aws_vpc_cni_plugin" {
  provisioner "local-exec" {
    environment = {
      CLUSTER_ENDPOINT = data.aws_eks_cluster.cluster.endpoint
      CLUSTER_CA       = data.aws_eks_cluster.cluster.certificate_authority.0.data
      CLUSTER_TOKEN    = data.aws_eks_cluster_auth.cluster.token
    }
    command = format("%s/remove-aws-vpc-cni-plugin.sh", path.module)
  }
}

resource "helm_release" "vcni" {
  depends_on = [null_resource.remove_aws_vpc_cni_plugin]
  repository = "https://aws.github.io/eks-charts"

  name       = "aws-vpc-cni"
  chart      = "aws-vpc-cni"
  namespace  = "kube-system"

  set {
    name  = "env.ENABLE_PREFIX_DELEGATION"
    value = true
  }
}

jdn5126 · 2024-01-24T16:18:58Z

Closing as the aws-vpc-cni helm chart is helm3 compatible and the README has been updated to discuss adopting resources vs fresh installs

3oris mentioned this issue Feb 17, 2020

Please remove reference to helm chart from README.md aws/amazon-vpc-cni-k8s#847

Closed

jaypipes added the bug Something isn't working label Feb 18, 2020

3oris mentioned this issue Feb 24, 2020

Not possible to create node group with attach_worker_cni_policy = false terraform-aws-modules/terraform-aws-eks#746

Closed

4 tasks

willejs mentioned this issue Jul 2, 2020

allow original match label on daemonset to enable helm adoption #192

Merged

TBBle mentioned this issue Jul 24, 2020

Document the adopt-marked-kubectl-resources-on-install behaviour helm/helm-www#755

Open

jayanthvn assigned couralex6 Feb 3, 2021

arthurbdiniz mentioned this issue Jun 3, 2021

[aws-vpc-cni] Helm install failed, due to missing required annotation for CRD metadata #421

Closed

jackivanov mentioned this issue Aug 9, 2021

Enable prefixes delegation by default aws/amazon-vpc-cni-k8s#1571

Closed

jdn5126 closed this as completed Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle helm upgrade of AWS VPC CNI more gracefully #57

Handle helm upgrade of AWS VPC CNI more gracefully #57

realvz commented Dec 20, 2019

max-rocket-internet commented Jan 2, 2020

3oris commented Feb 18, 2020 •

edited

Loading

jaypipes commented Feb 18, 2020

mohsen0 commented Feb 18, 2020 •

edited

Loading

3oris commented Feb 18, 2020

stefanprodan commented Feb 18, 2020

mohsen0 commented Feb 18, 2020

max-rocket-internet commented Feb 20, 2020

max-rocket-internet commented Feb 20, 2020

3oris commented Feb 20, 2020 •

edited

Loading

mbarrien commented May 14, 2020 •

edited

Loading

willejs commented Jul 2, 2020

TBBle commented Jul 13, 2020 •

edited

Loading

Skaronator commented Oct 27, 2020 •

edited

Loading

v-yarotsky commented Feb 26, 2021

TBBle commented Feb 26, 2021

jdn5126 commented Nov 22, 2022

yama6a commented Dec 4, 2022 •

edited

Loading

jdn5126 commented Jan 24, 2024

Handle helm upgrade of AWS VPC CNI more gracefully #57

Handle helm upgrade of AWS VPC CNI more gracefully #57

Comments

realvz commented Dec 20, 2019

max-rocket-internet commented Jan 2, 2020

3oris commented Feb 18, 2020 • edited Loading

jaypipes commented Feb 18, 2020

mohsen0 commented Feb 18, 2020 • edited Loading

3oris commented Feb 18, 2020

stefanprodan commented Feb 18, 2020

mohsen0 commented Feb 18, 2020

max-rocket-internet commented Feb 20, 2020

max-rocket-internet commented Feb 20, 2020

3oris commented Feb 20, 2020 • edited Loading

mbarrien commented May 14, 2020 • edited Loading

willejs commented Jul 2, 2020

TBBle commented Jul 13, 2020 • edited Loading

Skaronator commented Oct 27, 2020 • edited Loading

v-yarotsky commented Feb 26, 2021

TBBle commented Feb 26, 2021

jdn5126 commented Nov 22, 2022

yama6a commented Dec 4, 2022 • edited Loading

jdn5126 commented Jan 24, 2024

3oris commented Feb 18, 2020 •

edited

Loading

mohsen0 commented Feb 18, 2020 •

edited

Loading

3oris commented Feb 20, 2020 •

edited

Loading

mbarrien commented May 14, 2020 •

edited

Loading

TBBle commented Jul 13, 2020 •

edited

Loading

Skaronator commented Oct 27, 2020 •

edited

Loading

yama6a commented Dec 4, 2022 •

edited

Loading