Troubleshooting Kubernetes network policies: Enforcing policies in Amazon EKS
This article explains how to troubleshoot Kubernetes network policies in Amazon Elastic Kubernetes Service (Amazon EKS).
Introduction
When a Kubernetes network policy in Amazon EKS doesn’t behave as you configured it, it can be difficult to determine the cause of the issue. The cause can include these factors:
- A misconfigured label selector.
- A control plane resolution issue.
- A problem with enforcement at the Linux kernel level.
Sometimes network behavior doesn’t match what you see in your configuration files. When you check the rules on your worker nodes, you can see exactly where traffic is blocked. This visibility helps you find the root cause of network problems so that you can quickly fix issues and make sure that your security rules work as expected.
Unlike traditional iptables-based implementations, Amazon Virtual Private Cloud (Amazon VPC) CNI uses eBPF programs directly attached to the pod’s network interfaces to enforce network policies. Because enforcement happens inside the Linux kernel, conventional troubleshooting techniques might not provide sufficient visibility.
This article explains how to use Amazon EKS to implement network policies and provides a structured troubleshooting procedure to verify enforcement at the kernel level.
Architecture overview
With Amazon VPC CNI version 1.14 and later, Amazon EKS uses eBPF to implement network policy enforcement, as seen in Figure 1.
Figure 1: Architecture overview.
The solution consists of the following parts.
Control plane
The network policy controller runs in the managed Amazon EKS control plane and is responsible for the following tasks:
- Monitor Kubernetes NetworkPolicy resources.
- Resolve pod and namespace selectors.
- Convert selectors into IP-based rules.
- Create or update a PolicyEndpoint custom resource.
The PolicyEndpoint CRD acts as a bridge between the control plane and worker nodes.
For more information, see amazon-network-policy-controller-k8s on the GitHub website.
Data plane
On the data plane, the following resources handle enforcement for each worker node:
- The AWS Network Policy Agent runs in the aws-node DaemonSet.
- The eBPF programs attached to pod Virtual Ethernet (veth) interfaces.
- The eBPF maps store resolved rules.
The AWS Network Policy agent is responsible for the following tasks:
- Watches the PolicyEndpoint objects.
- Compiles eBPF programs and attaches the programs to the pod’s veth connection.
- Maintains BPF maps that contain the allowed IP address prefixes, protocol and port rules, connection tracking information, and policy state.
For more information, see aws-network-policy-agent on the GitHub website.
Packet filtering uses Traffic Control (TC) hooks and occurs inside the Linux kernel. Packet filtering includes the following steps:
- Ingress receipt: When a packet arrives at the node and is destined for a Pod, the eBPF program starts at the TC ingress hook.
- Connection tracking: The Packet Filtering checks the aws_conntrack_map. If the connection is already established, then the connection follows the Fast Path. For new connections, the policy_map is referenced and evaluated.
- Verdict: If TC allows the filtering, then TC returns TC_ACT_OK. If TC doesn’t allow the filtering, then TC returns TC_ACT_SHOT (DROP).
- Egress transmission: TC similarly evaluates outgoing traffic from the pod at the TC egress hook.
Implementing the solution
To implement the solution, complete the following tasks.
Prerequisites
- Amazon EKS cluster with Kubernetes version 1.25 or later
- Amazon VPC CNI add-on version 1.14.0 or later
- Kubectl configured
- AWS Command Line Interface (AWS CLI) installed
Note: All kubectl and AWS CLI commands are run from your local terminal. You must configure kubectl to connect to your EKS cluster.
Turn on network policy
To turn on network policy, run the following commands:
aws eks update-addon \
--cluster-name <cluster-name> \
--addon-name vpc-cni \
--configuration-values '{"enableNetworkPolicy":"true"}'
Choose your enforcement mode
For this solution, you must select your enforcement mode. The default is Standard mode and allows all traffic until a network policy selects the pod.
For Strict mode, the network policy denies all traffic unless the policy explicitly allows the traffic source.
To turn on Strict mode, run the following command:
aws eks update-addon \
--cluster-name <cluster-name> \
--addon-name vpc-cni \
--configuration-values '{"enableNetworkPolicy":"true","env":{"NETWORK_POLICY_ENFORCING_MODE":"strict"}}'
In Strict Mode, it’s important to note the following configurations:
- Pods start in a default-deny state.
- You must explicitly allow DNS traffic.
- Node IP addresses are automatically allowed for Kubelet health probes.
- For all other required communication, you must explicitly allow the traffic. If you don’t allow required communication, then applications can start to fail.
Important: It’s a best practice to validate these dependencies in a test environment before you implement the solution in a production environment.
Troubleshooting network policies
To troubleshoot network policies, complete the following tasks:
- Confirm that the NetworkPolicy exists.
- Review the PolicyEndpoint resolution.
- Use TC to confirm that the eBPF program is attached.
- Inspect the eBPF maps on the worker node to make sure that they’re correctly configured.
For more information on Amazon EKS best practices, see the Amazon EKS Best Practices Guide.
Solution example: Reviewing a denied connection
In the following example, a pod that has the access=false label can’t reach an NGINX service. The following steps demonstrate how to review and confirm that the solution blocks the connection at the kernel level.
The following are the configurations for the example:
- The deployment for the target has the app: nginx label.
- The network policy allows ingress traffic from only pods that have the access=true label.
- The investigation target is a pod that has the access=false label and the IP address 172.31.96.116.
Apply the network policy
Apply the following example policy to restrict access to the NGINX service:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: access-nginx
spec:
podSelector:
matchLabels:
app: nginx
ingress:
- from:
- podSelector:
matchLabels:
access: "true"
$ kubectl apply -f networkpolicy.yaml
Create test workloads
In the following example, the NGINX service and two test pods are deployed. The policy allows access to the pod that’s configured with accesstrue, and one denies access to the pod that’s configured with accessfalse.
Example:
$ kubectl create deploy nginx --image=nginx --replicas=1
$ kubectl expose deploy nginx --port 80
$ kubectl run accesstrue --image=busybox --labels access=true -- sleep infinity
$ kubectl run accessfalse --image=busybox --labels access=false -- sleep infinity
After you create the pods, run the following command to check the running state of the pods and note the IP addresses:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
accessfalse 1/1 Running 0 15m 172.31.96.116 ip-172-31-103-129.ap-northeast-1.compute.internal <none> <none>
accesstrue 1/1 Running 0 15m 172.31.101.218 ip-172-31-103-129.ap-northeast-1.compute.internal <none> <none>
nginx-f8d9c6576-8b4vf 1/1 Running 0 29m 172.31.104.5 ip-172-31-103-129.ap-northeast-1.compute.internal <none> <none>
Test the connectivity
In this example, the accesstrue pod works as expected, but the accessfalse pod times out. When you check the connectivity of the pods, the output shows what happens in the kernel of the pods.
accesstrue pod output:
$ kubectl exec accesstrue -- wget --spider -T 1 nginx
Connecting to nginx (172.31.104.5:80)
remote file exists
accessfalse pod output:
$ kubectl exec accessfalse -- wget --spider -T 1 nginx
Connecting to nginx (172.31.104.5:80)
wget: download timed out
Review the kernel
To understand why the accessfalse pod is blocked, inspect the ingress map of the target NGINX pod on the worker node.
For the node where the NGINX is running, run the following command to start a debug pod:
$ kubectl debug node/ip-172-31-103-129.ap-northeast-1.compute.internal -it --image=amazonlinux:2023 --profile=sysadmin
bash-5.2# chroot /host
Verify tc attachment
Because tc might not be pre-installed on EKS-optimized Amazon Machine Images (AMIs), run the following command to install it inside the debug container:
dnf install iproute-tc -y
Then, review the filters to verify that tc is attached:
sh-5.2# tc filter show dev eni9c515deae49 ingress
filter protocol all pref 1 bpf chain 0
filter protocol all pref 1 bpf chain 0 handle 0x1 handle_egress direct-action not_in_hw id 433 name tag 557a80b8403d758b jited
sh-5.2# tc filter show dev eni9c515deae49 egress
filter protocol all pref 1 bpf chain 0
filter protocol all pref 1 bpf chain 0 handle 0x1 handle_ingress direct-action not_in_hw id 432 name tag aeee5ad55207c8b0 jited
Note: In tc, direction is relative to the host. The pod’s egress is tc ingress on veth, and the pod’s ingress is tc egress on veth.
Inspect eBPF maps
Run the following command to identify the eBPF program and map IDs for the NGINX pod:
sh-5.2# /opt/cni/bin/aws-eks-na-cli ebpf loaded-ebpfdata | grep nginx-f8d9c6576-default -A11
PinPath: /sys/fs/bpf/globals/aws/programs/nginx-f8d9c6576-default_handle_egress
Pod Identifier : nginx-f8d9c6576-default Direction : egress
Prog ID: 433
Associated Maps ->
Map Name: egress_pod_state_map
Map ID: 96
Map Name: policy_events
Map ID: 22
Map Name: aws_conntrack_map
Map ID: 21
Map Name: egress_map
Map ID: 95
========================================================================================
PinPath: /sys/fs/bpf/globals/aws/programs/nginx-f8d9c6576-default_handle_ingress
Pod Identifier : nginx-f8d9c6576-default Direction : ingress
Prog ID: 432
Associated Maps ->
Map Name: policy_events
Map ID: 22
Map Name: aws_conntrack_map
Map ID: 21
Map Name: ingress_map
Map ID: 93
Map Name: ingress_pod_state_map
Map ID: 94
========================================================================================
Run the following command to dump the ingress_map (ID: 93) to find the IP addresses that the policy allows to reach the NGINX pod:
sh-5.2# /opt/cni/bin/aws-eks-na-cli ebpf dump-maps 93
Key : IP/Prefixlen - 172.31.101.218/32
-------------------
Value Entry : 0
Protocol - ANY PROTOCOL
StartPort - 0
Endport - 0
-------------------
* Key : IP/Prefixlen - 172.31.103.129/32
-------------------
Value Entry : 0
Protocol - ANY PROTOCOL
StartPort - 0
Endport - 0
-------------------
* Done reading all entries
Analyze the results
In the preceding map dump, the accesstrue pod is present. The Node IP address for the Kubelet health probes is also present. However, the accessfalse pod isn’t present in the dump. Because the source IP address for accessfalse isn’t present in the eBPF allowlist, the eBPF program returns a TC_ACT_SHOT (DROP) verdict. This verdict means that the eBPF program dropped the packet before the packet reached the container.
Cleaning up
After you launch the solution, delete the resources that you created to avoid unnecessary costs and keep your cluster clean.
To delete the NGINX deployment, run the following command:
kubectl delete deploy nginx
To delete the NGINX service, run the following command:
kubectl delete svc nginx
To delete the test pods, run the following command:
kubectl delete pod accesstrue accessfalse
To delete the network policy, run the following command:
kubectl delete networkpolicy access-nginx
Additional support
For additional support, contact AWS Support and include the following information in your request:
- Cluster name and AWS Region
- The output of **kubectl describe policyendpoint YourEndpointName - Logs from the aws-node pod on the affected worker node
- The output of the ebpf dump-maps command
Conclusion
Amazon VPC CNI uses eBPF programs attached directly to pod network interfaces to enforce network policies. This authoritative enforcement state occurs on the BPF map that’s attached to the pod’s veth interface, not the YAML definition.
To make sure that your network policy correctly functions, review the following configurations:
- NetworkPolicy definition
- PolicyEndpoint resolution
- eBPF map contents
- tc attachment
About the author
Kazuto Okada
Kazuto Okada is a Cloud Support Engineer on the AWS Deployment Support team. He’s passionate about Kubernetes networking and security.
- Language
- English

Relevant content
- Accepted Answerasked 2 years ago
- asked 2 years ago
AWS OFFICIALUpdated 2 years ago