Skip to content

Hardening AWS-managed EKS Cluster Security Group: Alternatives to the default "All All Self" rule?

1

Hello,

I am working on hardening the security groups for our Amazon EKS clusters to meet strict internal security policies. Our security team flagged the "All traffic, All ports, Self" (self=true / -1 protocol) inbound and outbound rules as overly permissive.

For our architecture, we noticed that the "All traffic, All ports, Self" rule is present in two places:

  1. The default EKS Cluster Security Group (named roughly eks-cluster-sg-<cluster-name>-<id>).
  2. The Worker Node Security Group.

[UPDATE — 2026-04-20]: Cluster SG changes have been successfully completed. No issues observed. Details and remaining Worker SG questions added below.


Part 1 — Cluster SG (Resolved ✅)

For our custom Cluster SG, we have replaced the overly open rules with specific inbound and outbound rules:

  • Inbound: TCP 443 from VPC CIDR only (private endpoint, no public access)
  • Outbound: Specific rules to Worker Node SG by SG ID (ports 10250, 9443, 8443, 443, DNS 53/UDP and 53/TCP to VPC CIDR)

We confirmed the default eks-cluster-sg-* "All-All-Self" rule is recreated on cluster updates and have documented it as a platform requirement, mitigated by restricted membership (no worker nodes are members of that SG). The "blast radius" is effectively zero per the guidance in the answers below.

My original questions on this topic (answered below):

  1. Is our approach of replacing the All All Self rule with specific ports on the Worker Node Security Group fully supported and recommended by AWS, or will it cause any unexpected issues with standard EKS add-ons or node lifecycle events?
  2. For the default EKS Cluster Security Group, is there any officially supported method to restrict or harden this specific All All Self rule without the EKS platform automatically reverting it during cluster updates?
  3. If this rule is strictly immutable for the managed cluster SG, is there an AWS-recommended technical mitigation or architectural pattern to address this constraint for strict security compliance audits?

Part 2 — Worker Node SG (Pending — Need Guidance ⏳)

I am now preparing to apply the same hardening to the Worker Node Security Group, and I have specific concerns before executing.

Planned Change: Remove the current All-All-Self (All traffic, All ports, Source: Self) inbound rule and replace it with these 6 specific rules (all using Source: Self / same SG ID):

PortProtocolJustification
10250TCPKubelet API — inter-node exec/logs/metrics
53TCPCoreDNS TCP — large DNS responses (>512 bytes)
53UDPCoreDNS UDP — standard pod DNS queries
443TCPHTTPS inter-node — metrics-server, webhooks
1025–65535TCPEphemeral TCP — pod-to-pod return traffic (VPC CNI)
1025–65535UDPEphemeral UDP — UDP workloads across nodes (VPC CNI)

This change will be applied manually via the AWS Console (in-place, no node replacement).

Additional questions I still need answered:

Q4. In-place safety: Is it safe to modify an active Worker Node SG in-place while the nodes are running and serving production traffic? Is there a risk of momentary connection drops during the SG rule update, particularly for long-lived TCP connections between pods on different nodes?

Q5. Add-on port coverage: Are there any commonly used EKS add-ons or DaemonSets — such as VPC CNI, CoreDNS, kube-proxy, AWS Load Balancer Controller, Fluent Bit, or Spot Node Termination Handler — that require inter-node communication on ports NOT covered by my 6-rule list? I want to avoid a scenario where an add-on silently breaks after the change.

Q6. Spot Instance node lifecycle: Our cluster uses EC2 Spot instances. Does the Spot interruption and replacement flow (node drain → cordon → new node bootstrap → rejoin cluster) require any additional inter-node ports beyond what is listed above?

Q7. VPC CNI ipamd coordination: The VPC CNI plugin (amazon-vpc-cni-k8s) uses ipamd to manage pod ENIs. Beyond ephemeral ports 1025–65535 for pod-to-pod traffic, does ipamd itself use any specific port for inter-node or control-plane coordination that I might be missing?

Q8. NodePort Services: If the cluster runs NodePort-type Services (default port range 30000–32767), do these ports also need to be explicitly added to the self-reference inbound rules to allow cross-node traffic routing for NodePort?

Environment details:

  • EKS cluster with private endpoint only (no public endpoint)
  • Worker Nodes: EC2 Spot Instances, mixed instance types
  • VPC CNI in standard mode (not custom networking)
  • No Security Groups for Pods (SGP) feature in use
  • Changes applied manually via AWS Console (no Terraform apply during change window)

Thank you again for your guidance — the previous answers were very helpful and I want to be thorough before making changes to a production cluster.

2 Answers
6
Accepted Answer

The behavior you are observing is by design, and the "fix" lies in your architectural pattern rather than modifying the default Security Group (SG).

1. Is replacing "All-All-Self" on Worker Nodes supported?

Yes, it is supported, but with a caveat. While you can define specific ports (10250 for Kubelet, 53 for CoreDNS, etc.) in your custom Node Security Groups, it is often difficult to maintain.

  • VPC CNI Requirement: The VPC CNI requires a wide range of ports (typically 1025-65535) for node-to-node communication and pod-to-pod networking.
  • Recommendation: Instead of manually listing every port, it is standard practice to allow traffic between nodes by referencing the Security Group ID as the source, rather than a CIDR block.

2. Can the default EKS Cluster SG be hardened?

No. As you’ve discovered, the EKS control plane performs a "reconciliation loop." If it detects that the mandatory rules for the eks-cluster-sg-* are missing, it will recreate them during cluster updates or maintenance. This is a safety mechanism to ensure the Control Plane never loses contact with the Nodes, which would result in a cluster failure.

3. Recommended Architectural Mitigation (The "Audit-Friendly" Way)

To satisfy a strict security audit, the best approach is to minimize the scope of the default Cluster Security Group.

  • Decouple the SGs: Do not use the default eks-cluster-sg-* for your Worker Nodes. Instead, treat that SG as an exclusive "Control Plane" group.
  • Use Custom Node SGs: Create a dedicated Security Group for your Node Groups.
  • Implement Least Privilege: In your Node SG, allow Ingress from the Cluster SG only on ports 10250 (Kubelet) and any ports used by admission webhooks (e.g., 8443, 9443).
  • In your Cluster SG, allow Ingress from the Node SG only on port 443 (HTTPS for the API Server).
  • The Compliance Argument: By removing your Worker Nodes from the default Cluster SG, the "All-All-Self" rule becomes a non-issue. Since no other resources are members of that SG, the rule has no "targets" to allow traffic to. The "Blast Radius" is effectively zero.

Checklist for your Audit:

1. Isolate the Default SG: Only the EKS-managed ENIs (Control Plane) should be members of the eks-cluster-sg-*.

2. Explicit Cross-Referencing: Use Security Group Rules that reference IDs (Source: sg-12345) rather than 0.0.0.0/0.

3. Documentation: Document that the "All-Self" rule is a functional requirement for the AWS-managed Control Plane and is mitigated by restricted membership.

I hope this helps you clear your security audit while maintaining a stable EKS environment!

See also:

EXPERT
answered a month ago
AWS
EXPERT
reviewed a month ago
2

The All-All-Self rule on eks-cluster-sg-* is AWS-managed and will be recreated on every cluster update. The solution is to make it irrelevant through architecture.

Worker Node SG Your port list (10250, 53, 443, 1025–65535) is correct. Reference Security Group IDs as the source rather than CIDR blocks for precision and maintainability.

Cluster SG Remove worker nodes from eks-cluster-sg-* entirely. Assign them a dedicated Node Security Group and apply least-privilege cross-referencing: Node SG → Cluster SG: TCP 443 Cluster SG → Node SG: TCP 10250, plus admission webhook ports (8443, 9443) With no worker nodes as members, the All-All-Self rule has no targets and the blast radius is effectively zero.

For compliance audits Document the rule as a non-negotiable AWS platform requirement for Control Plane stability, mitigated by restricted SG membership. This satisfies least-privilege intent without compromising cluster stability.

References EKS Security Group requirements: https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html EKS Best Practices – Network Security: https://docs.aws.amazon.com/eks/latest/best-practices/network-security.html Security Groups for Pods: https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html

answered a month ago
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.