EC2 throughput comes down when running performance tests using Jmeter on EKS cluster

1

I have a EKS cluster with 12 nodes, where 1 node acts as webserver and other nodes are used for generating HTTP traffic using Jmeter-slaves. I try to fetch 10MB file and homepage file which is of few KBs. The test is exactly like this https://blog.kubernauts.io/load-testing-as-a-service-with-jmeter-on-kubernetes-fc5288bb0c8b When I run the test using Jmeter after bringing up the nodes, I was able to get the expected response time and throughput (I had a baseline which I took couple of months, hence I am comparing the same) for the first time. After few mins I try to run the same test using the nodes, I dont get the desired performance numbers. The throughput is low and response time is high. I am using t3.xlarge nodes with Kubernetes v1.27. Is there any rate-limit which is causing the performance drop?.

I have the attached the NetworkOut graph from the webserver EC2 instance, we could see the first time we were able to see 37GB data served, subsequent tests were really less than first test.

Enter image description here

Nodegroup info, Enter image description here

1 Answer
4

There could be several reasons why the performance of your EKS cluster might decrease over time. Here are some possibilities:

  • Network Throttling on t3.xlarge instances: AWS EC2 instances come with a certain network performance capacity, defined in the form of a maximum bandwidth. For the t3.xlarge instance, the network performance is defined as "Up to 5 Gbps". If you're generating traffic that's hitting this limit, it could cause a decrease in performance. This might be the case if your tests are generating a large amount of network traffic.

  • Burstable Performance Instances (CPU credits): AWS t3 instances are burstable performance instances, which means that they provide a baseline level of CPU performance with the ability to burst above the baseline. If you're exhausting your CPU credits, it might impact your test performance over time. You can check your CPU credit balance in the EC2 console.

  • Garbage Collection or JVM Heap Size in JMeter: If you're using JMeter, remember that it's a Java application. Java applications use a heap for memory, and when this heap becomes full, the JVM needs to perform garbage collection, which can slow down your application. Check the JVM heap size settings for your JMeter instances.

To debug this issue, you can start by checking the metrics for your EC2 instances and EKS nodes in CloudWatch to see if there are any obvious resource shortages. You can also look at the logs for your JMeter pods to see if there are any error messages or warnings.

If you're suspecting that network throttling might be an issue, you can try using a larger EC2 instance type that has a higher network performance limit. If you're suspecting that the issue might be related to CPU credits, you could try using an EC2 instance type that provides unlimited bursting, or a non-burstable instance type.

Finally, if none of that works, you can also consider reaching out to AWS Support for assistance in debugging this issue. They might be able to provide more detailed insights based on your specific configuration and workload.

profile picture
EXPERT
answered 10 months ago
  • Ivan Casco, thank you answering my question. But I have performed same benchmark testing on March 2023, I didnt face this issue. It was same cluster on same AWS region. All the test parameters are identical its just that I didnt hit the BW throttling issue like now.

    Also, t3.xlarge can burst upto 5Gbps, I dont think I have hit the BW limit. Also, the pods didnt run out of memory or any memory alert was observed.

    Something got changed in between March to June 2023?. AMI version, Amazon CNI version which is causing this issue??.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions