How do I plan an upgrade strategy for an Amazon EKS cluster?

8 minute read
2

When I upgrade my Amazon Elastic Kubernetes Service (Amazon EKS) cluster, I want to follow best practices.

Short description

New Kubernetes versions can introduce significant changes to your Amazon EKS cluster. After you upgrade a cluster, you can't downgrade your cluster. When you upgrade to a newer Kubernetes version, you can migrate to new clusters instead of performing in-place cluster upgrades. If you choose to migrate to new clusters, then cluster backup and restore tools like VMware's Velero can help you migrate. For more information, see Velero on the GitHub website.

To see current and past versions of Kubernetes that are available for Amazon EKS, see the Amazon EKS Kubernetes release calendar.

Resolution

Prepare for an upgrade

Before you begin your cluster upgrade, note the following requirements:

Review major updates for Amazon EKS and Kubernetes

Review all documented changes for the upgrade version, and note any required upgrade steps. Also, note any requirements or procedures that are specific to Amazon EKS managed clusters.

Refer to the following resources for any major updates to Amazon EKS clusters platform versions and Kubernetes versions:

For more information on Kubernetes upstream versions and major updates, see the following documentation:

Understand the Kubernetes deprecation policy

When an API is upgraded, the earlier API is deprecated and eventually removed. To understand how APIs might be deprecated in a newer version of Kubernetes, read the deprecation policy on the Kubernetes website.

To check whether you use any deprecated API versions in your cluster, use the Kube No Trouble (kubent) on the GitHub website. If you use deprecated API versions, then upgrade your workloads before you upgrade your Kubernetes cluster.

To convert Kubernetes manifest files between different API versions, use the kubectl convert plugin. For more information, see Install kubectl convert plugin on the Kubernetes website.

What to expect during a Kubernetes upgrade

When you upgrade your cluster, Amazon EKS launches new API server nodes with the upgraded Kubernetes version to replace the existing nodes. If any of these checks fail, then Amazon EKS rolls back the infrastructure deployment, and your cluster remains on the previous Kubernetes version. However, this rollback doesn't affect any applications that are running, and you can recover any clusters, if needed. During the upgrade process, you might experience minor service interruptions.

Upgrade the control plane and data plane

To upgrade an Amazon EKS cluster, you must update two main components: the control plane and the data plane. When you upgrade these components, keep the following considerations in mind.

In-place Amazon EKS cluster upgrades

For in-place upgrades, you can upgrade only to the next highest Kubernetes minor version. If there are multiple versions between your current cluster version and the target version, then you must upgrade to each version sequentially. For each in-place Kubernetes cluster upgrade, complete the following tasks:

  • Update your Kubernetes manifests and update deprecated or removed APIs, as required.
  • Upgrade the cluster control plane.
  • Upgrade the nodes in your cluster.
  • Update your Kubernetes add-ons and custom controllers, as required.

For more information, see Planning and executing Kubernetes version upgrades in Amazon EKS in Planning Kubernetes upgrades with Amazon EKS. Also, see Best practices for cluster upgrades on the GitHub website.

Blue/green or canary Amazon EKS clusters migration

A blue/green or canary upgrade strategy can be more complex, but the strategy allows upgrades with easy rollback capability and no downtime. For a blue/green or canary upgrade, see Blue/green or canary Amazon EKS clusters migration for stateless ArgoCD workloads.

Upgrade Amazon EKS managed node groups

Important: A node's kubelet can't be newer than kube-apiserver. Also, it can't be more than two minor versions earlier than kube-apiserver. For example, suppose that kube-apiserver is at version 1.24. In this case, a kubelet is supported only at versions 1.24, 1.23, and 1.22.

To completely upgrade your managed node groups, complete the following steps:

  1. Upgrade your Amazon EKS cluster control plane components to the latest version.
  2. Update your nodes in the managed node group.

Migrate to Amazon EKS managed node groups

If you use self-managed node groups, then you can migrate your workload to Amazon EKS managed node groups with no downtime. For more information, see Seamlessly migrate workloads from EKS self-managed node group to EKS-managed node groups.

Identify and upgrade downstream dependencies (add-ons)

Clusters often contain outside products, such as ingress controllers, continuous delivery systems, monitoring tools, and other workflows. When you update your Amazon EKS cluster, you must also update your add-ons and third-party tools. Make sure you understand how add-ons work with your cluster and how they're updated.

Note: It's a best practice to use managed add-ons instead of self-managed add-ons.

Review the following examples of common add-ons and relevant upgrade documentation:

Upgrade AWS Fargate nodes

To update a Fargate node, delete the pod that the node represents. Then, after you update your control plane, redeploy the pod. Any new pods that you launch on Fargate have a kubelet version that matches your cluster version. Existing Fargate pods aren't changed.

Note: To keep Fargate pods secure, Amazon EKS must periodically patch them. Amazon EKS tries to update the pods in a way that reduces its effects. However, if pods can't be successfully evicted, then Amazon EKS deletes them. To minimize disruption, see Fargate OS patching.

Upgrade groupless nodes that Karpenter creates

When you set a value for ttlSecondsUntilExpired, this value activates node expiry. After nodes reach the defined age in seconds, Amazon EKS deletes them. This deletion occurs even if the nodes are in use. This process allows you to replace nodes with newly provisioned instances, and therefore upgrade them. When a node is replaced, Karpenter uses the latest Amazon EKS optimized AMIs. For more information, see Disruption on the Karpenter website.

The following example shows a node that's deprovisioned with ttlSecondsUntilExpired, and then replaced with an upgraded instance:

apiVersion: karpenter.sh/v1alpha5kind: Provisioner
metadata:
  name: default
spec:
  requirements:
    - key: karpenter.sh/capacity-type         # optional, set to on-demand by default, spot if both are listed
      operator: In
      values: ["spot"]
  limits:
    resources:
      cpu: 1000                               # optional, recommended to limit total provisioned CPUs
      memory: 1000Gi
  ttlSecondsAfterEmpty: 30                    # optional, but never scales down if not set
  ttlSecondsUntilExpired: 2592000             # optional, nodes are recycled after 30 days but never expires if not set
  provider:
        subnetSelector:
      karpenter.sh/discovery/CLUSTER_NAME: '*'
    securityGroupSelector:
      kubernetes.io/cluster/CLUSTER_NAME: '*'

Note: Karpenter doesn't automatically add jitter to this value. If you create multiple instances in a short amount of time, then the instances expire near the same time. To prevent excessive workload disruption, define a pod disruption budget. For more information, see Specifying a disruption budget for your application on the Kubernetes website.

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago