· GoLiveApp

EKS Cluster Rollback: Quick Reference for Upgrade Recovery

Everything you need before rolling back an EKS cluster — the 7-day window, what moves and what doesn't, step-by-step CLI commands, node group handling, and the gotchas that will burn you.

#kubernetes#eks#aws#devops#sre#upgrades

EKS lets you roll back the Kubernetes control plane to the previous minor version after an in-place upgrade. The window is 7 days. After that, it’s gone — no exceptions, no AWS Support workaround.

This is a field reference, not a tutorial. Use it when something went wrong after an upgrade and you’re deciding whether to roll back.


What Gets Rolled Back vs. What Doesn’t

Gets Rolled BackNOT Rolled Back
Kubernetes API server versionetcd data (all cluster state preserved)
Control plane components + configCustomer workloads (pods keep running)
Platform version (latest for N-1)EKS add-ons (manage separately)
EKS Auto Mode worker nodesManaged Node Groups (your action required)
Self-managed and hybrid nodes
Persistent volumes and data

Workloads keep running through the rollback. Your pods don’t restart. The API server version changes underneath them.


Prerequisites — Check All Before You Start

  • Cluster was upgraded in-place (not created at current version — those can’t roll back)
  • Within 7 days of the upgrade completing
  • Rolling back exactly one minor version (N → N-1 only — no skipping)
  • Target version is currently a supported EKS version
  • Cluster is in ACTIVE status — no in-progress updates
  • If target version is in extended support → change upgrade policy to EXTENDED first
  • Cluster was not auto-upgraded at the end of extended support (if it was, rollback is impossible)

The 7-Day Hard Stop

The rollback window is exactly 7 days from when the upgrade completed — not from when you noticed the problem.

Set a calendar alert the moment any upgrade finishes. If you hit day 8, your options are:

  • Fix forward by upgrading again once the issue is resolved
  • Manual intervention at the application layer

There is no --force workaround for an expired window.


Rollback Decision Flow

EKS cluster rollback decision flowchart


Step-by-Step with CLI Commands

Step 1 — Check rollback readiness insights

aws eks list-insights \
  --cluster-name my-cluster \
  --region us-east-1 \
  --filter '{"categories": ["ROLLBACK_READINESS"]}'

Get detail on a specific insight:

aws eks describe-insight \
  --cluster-name my-cluster \
  --region us-east-1 \
  --id <insight-id>

Manually refresh insight data after resolving an issue:

aws eks start-insights-refresh \
  --cluster-name my-cluster \
  --region us-east-1

Insight status guide:

StatusBlocks Rollback?Action
PASSINGNoProceed
WARNINGNoAdvisory — review but proceed
ERRORYesResolve, or use --force
UNKNOWNYesResolve, or use --force

Step 2 — Prepare worker nodes

Nodes cannot run a version newer than the control plane after rollback.

Node TypeWhat To Do
EKS Auto ModeNothing — EKS handles it automatically before touching the control plane
Managed Node GroupsRun update-nodegroup-version (below)
Self-managed / HybridUpdate AMIs to target version manually
FargateDelete pods running current version, then proceed

Managed Node Group rollback:

aws eks update-nodegroup-version \
  --cluster-name my-cluster \
  --nodegroup-name my-nodegroup \
  --kubernetes-version 1.30 \
  --region us-east-1

This respects your node group’s maxUnavailable / maxUnavailablePercentage settings.

Fargate note: rollback is not supported natively for Fargate nodes. Delete pods running the current kubelet version before initiating the control plane rollback. Those pods will re-launch with the rolled-back version when redeployed.


Step 3 — Downgrade incompatible add-ons

EKS does not roll back add-on versions automatically. Check compatibility before touching the control plane.

List current add-ons:

aws eks list-addons --cluster-name my-cluster --region us-east-1

Downgrade a specific add-on:

aws eks update-addon \
  --cluster-name my-cluster \
  --addon-name vpc-cni \
  --addon-version v1.12.0-eksbuild.2 \
  --region us-east-1

Rollback readiness insights check EKS-managed add-ons only. Self-managed add-ons are your responsibility to validate.


Step 4 — Initiate the rollback

aws eks update-cluster-version \
  --name my-cluster \
  --kubernetes-version 1.30 \
  --region us-east-1

Save the update.id from the response — you’ll need it to monitor progress.

To bypass ERROR or UNKNOWN insight checks:

aws eks update-cluster-version \
  --name my-cluster \
  --kubernetes-version 1.30 \
  --force \
  --region us-east-1

--force does NOT bypass:

  • The 7-day window
  • The “created at current version” check
  • The sequential version check
  • Auto Mode disruption controls (PDBs and do-not-disrupt annotations still honored)

Step 5 — Monitor

aws eks describe-update \
  --name my-cluster \
  --region us-east-1 \
  --update-id <your-update-id>

Status transitions:

Cluster TypePath
StandardInProgress → Successful or InProgress → Failed
Auto ModeStays ACTIVE during node rollback → UPDATING for control plane

When you see Successful, the rollback is complete. Verify your add-ons are healthy and your workloads are behaving as expected.


Gotchas

Changes during rollback aren’t captured. Insights are point-in-time. If you create resources using new-version APIs after the insight check runs but before rollback completes, those resources persist in etcd. They may be incompatible with the rolled-back API server and won’t be garbage collected automatically.

Extended support charges restart immediately. If you roll back from a standard-support version to an extended-support version, extended support billing resumes the moment rollback completes. Budget accordingly.

CloudFormation doesn’t trigger rollback. If a CFN stack update fails and reverts to a template with a lower Kubernetes version, that does not trigger a cluster version rollback. You must call UpdateClusterVersion explicitly — CFN template changes alone do nothing.

Sequential rollback only. You can only go N → N-1. If you upgraded 1.31 → 1.32 → 1.33, you can roll back to 1.32. Getting to 1.31 requires a second rollback within its own 7-day window.

Incompatible resources stay in etcd. If you used --force to bypass insight checks, any resources created with newer APIs remain persisted. The API server on N-1 won’t recognize them — they’re inert until you clean them up manually.


Quick Reference Card

ROLLBACK ELIGIBILITY
  ✓ In-place upgraded cluster   ✓ Within 7 days
  ✓ N → N-1 only               ✓ Cluster ACTIVE

ORDER OF OPERATIONS
  1. Check insights (ROLLBACK_READINESS)
  2. Prepare worker nodes (MNG: update-nodegroup-version)
  3. Downgrade incompatible add-ons
  4. aws eks update-cluster-version --kubernetes-version N-1
  5. Monitor: aws eks describe-update

AUTO MODE: nodes handled automatically — no step 2 needed

FORCE FLAG: bypasses insight checks only
           does not bypass: 7-day window, version checks,
           Auto Mode disruption controls

Comments