Latency in Step function for express workflow invoked from aws sdk java v2


Hello everyone, I am working on a poc to measure response time of Step function for various number of concurrent requests. Two approaches i have taken : (1) Execute Step function from API gateway (2) Execute step function from aws sdk java v2

I am using apache benchmark tool to trigger 500, 1000, 3000 and 5000 requests with 200 concurrent requests for each. The response time for api gateway scenario I am get is around 200 to 300 ms but for aws sdk I am getting response time in the range of 200 to 400 ms. I was expecting it to be other way around as sdk call should not be going through as api gateway and directly hitting the Step function StartSyncExecution api but strangely for sdk latency is more.

For single request , aws sdk (avg res = 90 ms ) does perform better that apigaeway approach (avg res = 150 ms)

I have created a rest service using spring boot and aws sdk to invoke the step function and testing this in one of the EC2 machine.I have used express work flow of Step function and it is orchestrating two api calls through lambda. No other logic is present inside Step function. Tried changing the tomcat max thread pool to 500 and used ApacheHttpClient to change max Http connection to 300 (from default of 50) but still it does not help. It actually worsens the latency and makes it > 1 sec. Any suggestions would be appreciated for optimization

1 Answer


I understand that you are working on POC to measure Step Function response time, while invoking Step Function 1. from API Gateway and 2. from AWS SDK Java v2 running in EC2. After conducting multiple concurrency tests, you found that requests made from AWS SDK is taking slightly longer than via API Gateway.

In order to investigate this issue, we need to know what is the EC2 instance type and where is the API Gateway hosted. Are you using AWS API Gateway or third-party API. You mentioned with single request, SDK performs better, therefore the latency could be related to EC2 capacity. Generally, using larger EC2 instance type will provide higher capacity and faster processing. Have you already tested by using different EC2 instance types for benchmarking. Finally, what is the network path from EC2 to Step function, is it traversing via Private Link using VPC endpoint or through public Internet.

We recommend using X-ray service with Step function to determine latency of request, for your state machine. X-Ray will meter invocation time, state transition time, the overall execution time of Step Functions, and variances in this execution time. X-ray provides Visualization and Analysis tool to accurately trace Step function request. You can also integrate X-ray with AWS API Gateway and AWS SDK for end to end latency tracing.

For optimization, you can consider using Map state or Parallel state for concurrent executions, if applicable for your use case.

Please feel free to follow up with any additional questions or concerns.

answered 2 years ago
  • I am using t2.xlarge EC2 instance type for both tests and using AWS API gateway. And the network path from EC2 to Step function is via Private link using VPC . Its not using public internet. As far as using Map state or Parallel state is concerned, it is not relevant to our use case , so it cannot be used. I did try using x ray service for both tests , and found that execution of step function response time is very small fraction of total response time. It is not the cause of latency in both cases. Still not sure what could be the reason for latency in aws sdk.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions