EC2 throughput comes down when running performance tests using Jmeter on EKS cluster

Question

I have a EKS cluster with 12 nodes, where 1 node acts as webserver and other nodes are used for generating HTTP traffic using Jmeter-slaves. I try to fetch 10MB file and homepage file which is of few KBs. The test is exactly like this https://blog.kubernauts.io/load-testing-as-a-service-with-jmeter-on-kubernetes-fc5288bb0c8b
When I run the test using Jmeter after bringing up the nodes, I was able to get the expected response time and throughput (I had a baseline which I took couple of months, hence I am comparing the same) for the first time. After few mins I try to run the same test using the nodes, I dont get the desired performance numbers. The throughput is low and response time is high. 
I am using t3.xlarge nodes with Kubernetes v1.27. Is there any rate-limit which is causing the performance drop?.

I have the attached the NetworkOut graph from the webserver EC2 instance, we could see the first time we were able to see 37GB data served, subsequent tests were really less than first test.

![Enter image description here](/media/postImages/original/IMrpuqDylJQRCtX8DTuP0dGQ)

Nodegroup info,
![Enter image description here](/media/postImages/original/IMqErARlfTRv-Qh2Q2w2Xm5A)

Answer

There could be several reasons why the performance of your EKS cluster might decrease over time. Here are some possibilities:

- **Network Throttling on t3.xlarge instances**: AWS EC2 instances come with a certain network performance capacity, defined in the form of a maximum bandwidth. For the `t3.xlarge` instance, the network performance is defined as "Up to 5 Gbps". If you're generating traffic that's hitting this limit, it could cause a decrease in performance. This might be the case if your tests are generating a large amount of network traffic.

- **Burstable Performance Instances (CPU credits)**: AWS `t3` instances are burstable performance instances, which means that they provide a baseline level of CPU performance with the ability to burst above the baseline. If you're exhausting your CPU credits, it might impact your test performance over time. You can check your CPU credit balance in the EC2 console.

- **Garbage Collection or JVM Heap Size in JMeter**: If you're using JMeter, remember that it's a Java application. Java applications use a heap for memory, and when this heap becomes full, the JVM needs to perform garbage collection, which can slow down your application. Check the JVM heap size settings for your JMeter instances.

To debug this issue, you can start by checking the metrics for your EC2 instances and EKS nodes in CloudWatch to see if there are any obvious resource shortages. You can also look at the logs for your JMeter pods to see if there are any error messages or warnings.

If you're suspecting that network throttling might be an issue, you can try using a larger EC2 instance type that has a higher network performance limit. If you're suspecting that the issue might be related to CPU credits, you could try using an EC2 instance type that provides unlimited bursting, or a non-burstable instance type.

Finally, if none of that works, you can also consider reaching out to AWS Support for assistance in debugging this issue. They might be able to provide more detailed insights based on your specific configuration and workload.

EC2 throughput comes down when running performance tests using Jmeter on EKS cluster

Relevant content