- Newest
- Most votes
- Most comments
The behavior you are observing is by design, and the "fix" lies in your architectural pattern rather than modifying the default Security Group (SG).
1. Is replacing "All-All-Self" on Worker Nodes supported?
Yes, it is supported, but with a caveat. While you can define specific ports (10250 for Kubelet, 53 for CoreDNS, etc.) in your custom Node Security Groups, it is often difficult to maintain.
- VPC CNI Requirement: The VPC CNI requires a wide range of ports (typically 1025-65535) for node-to-node communication and pod-to-pod networking.
- Recommendation: Instead of manually listing every port, it is standard practice to allow traffic between nodes by referencing the Security Group ID as the source, rather than a CIDR block.
2. Can the default EKS Cluster SG be hardened?
No. As you’ve discovered, the EKS control plane performs a "reconciliation loop." If it detects that the mandatory rules for the eks-cluster-sg-* are missing, it will recreate them during cluster updates or maintenance. This is a safety mechanism to ensure the Control Plane never loses contact with the Nodes, which would result in a cluster failure.
3. Recommended Architectural Mitigation (The "Audit-Friendly" Way)
To satisfy a strict security audit, the best approach is to minimize the scope of the default Cluster Security Group.
- Decouple the SGs: Do not use the default eks-cluster-sg-* for your Worker Nodes. Instead, treat that SG as an exclusive "Control Plane" group.
- Use Custom Node SGs: Create a dedicated Security Group for your Node Groups.
- Implement Least Privilege: In your Node SG, allow Ingress from the Cluster SG only on ports 10250 (Kubelet) and any ports used by admission webhooks (e.g., 8443, 9443).
- In your Cluster SG, allow Ingress from the Node SG only on port 443 (HTTPS for the API Server).
- The Compliance Argument: By removing your Worker Nodes from the default Cluster SG, the "All-All-Self" rule becomes a non-issue. Since no other resources are members of that SG, the rule has no "targets" to allow traffic to. The "Blast Radius" is effectively zero.
Checklist for your Audit:
1. Isolate the Default SG: Only the EKS-managed ENIs (Control Plane) should be members of the eks-cluster-sg-*.
2. Explicit Cross-Referencing: Use Security Group Rules that reference IDs (Source: sg-12345) rather than 0.0.0.0/0.
3. Documentation: Document that the "All-Self" rule is a functional requirement for the AWS-managed Control Plane and is mitigated by restricted membership.
I hope this helps you clear your security audit while maintaining a stable EKS environment!
See also:
The All-All-Self rule on eks-cluster-sg-* is AWS-managed and will be recreated on every cluster update. The solution is to make it irrelevant through architecture.
Worker Node SG Your port list (10250, 53, 443, 1025–65535) is correct. Reference Security Group IDs as the source rather than CIDR blocks for precision and maintainability.
Cluster SG Remove worker nodes from eks-cluster-sg-* entirely. Assign them a dedicated Node Security Group and apply least-privilege cross-referencing: Node SG → Cluster SG: TCP 443 Cluster SG → Node SG: TCP 10250, plus admission webhook ports (8443, 9443) With no worker nodes as members, the All-All-Self rule has no targets and the blast radius is effectively zero.
For compliance audits Document the rule as a non-negotiable AWS platform requirement for Control Plane stability, mitigated by restricted SG membership. This satisfies least-privilege intent without compromising cluster stability.
References EKS Security Group requirements: https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html EKS Best Practices – Network Security: https://docs.aws.amazon.com/eks/latest/best-practices/network-security.html Security Groups for Pods: https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html
Relevant content
- AWS OFFICIALUpdated 10 months ago
