Load testing serverless stack using Gatling

0

Hi,
I'm doing some load testing on my serverless app and I see that it is unable to handle some higher loads. I'm using API Gateway. Lambda(Java 8) and DynamoDB. The code that I'm using is the same as this from this link.
In my load testing, I'm using Gatling. The load that I configured is that I'm doing a request with 120 users, then in one minute I ramp users from 120 to 400, and then for 2 minutes I'm making requests with 400 constant users per second. The problem is that my stack is unable to handle 400 users per second. Is it normal? I thought that serverless will scale nicely and will work like a charm.

Here is my Gatling simulation code:

public class OneEndpointSimulation extends Simulation {
    
    HttpProtocolBuilder httpProtocol = http
            .baseUrl("url") // Here is the root for all relative URLs
            .acceptHeader("text/html,application/xhtml+xml,application/json,application/xml;q=0.9,*/*;q=0.8") // Here are the common headers
            .acceptEncodingHeader("gzip, deflate")
            .acceptLanguageHeader("en-US,en;q=0.5")
            .userAgentHeader("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0");
    
    ScenarioBuilder scn = scenario("Scenario 1 Workload 2")
            .exec(http("Get all activities")
                    .get("/activitiesv2")).pause(1);
    
    {
        setUp(scn.injectOpen(atOnceUsers(120),
                        rampUsersPerSec(120).to(400).during(60),
                        constantUsersPerSec(400).during(Duration.ofMinutes(2))
                ).protocols(httpProtocol)
        );
    }
    
}

Here are the Gatling report results:
Image link
I'm also receiving an error:
**i.n.h.s.SslHandshakeTimeoutException: handshake timed out after 10000ms ** -> This is usually approx 50 requests. It is happening when Gatling is starting to inject 400 constant users per second.
I'm wondering what could be wrong.
It is too much for API Gateway, Lambda and DynamoDB?

2 Answers
0

There have definitely been large-scale events running on AWS that exceed the performance parameters you mention. For example, Prime Day 202. However, these events generally happen after careful planning and design as well as a bunch of testing (which I'm very glad to see you doing).

One of the things that usually pops up during stress/load testing are the quotas placed on each account for various services. I suspect that you may be bumping into those.

One thing you can do in the short term is to request limit increases for the services that you are using via the AWS console.

However, I'd also strongly recommend that you reach out to your local AWS Solutions Architect who can work with you on your application design - that's our job and it's what we're here for (to help customers get the best out of the platform).

profile pictureAWS
EXPERT
answered 2 years ago
  • Where can I reach out to such AWS Solutions Architect? Is there any special service for that?

  • I would start by checking LinkedIn in your local area; it's not possible (yet!) on re:Post to send personal messages otherwise I could recommend someone directly.

0

Your test results show long response time (about 10k ms for 90 percentile). It looks like there is some bottleneck point (ex. DyanamoDB throttling by capacity units with retry logics). You may see the cause of high latency by activating X-Ray for the api gateway and the lambda function.

AWS
answered 2 years ago
  • Hi, I checked the logs and I cannot see any throttled requests. Cloudwatch is saying that there are no throttled requests.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions