Skip to content

workaround for HTTP Task occasionally failing with States.DataLimitExceeded

0

Currently using the arn:aws:states:::http:invoke integration to query a 3rd party API that returns records from a database in JSON object form. The retrieved data is passed via Output to the next state, which uses arn:aws:states:::aws-sdk:s3:putObject to write it out. The data is never processed inside StepFunctions.

Unfortunately, even with the smallest pagination limit, occasionally the JSON data exceeds the StepFunctions 128k (or is it 256k?) limit and execution fails with States.DataLimitExceeded.

The only workaround I can think of is (as usual) writing a Lambda, to perform the HTTPS request and write the output to S3. This adds code to an otherwise no-code scenario, and forces me to replicate the OAuth2 mechanism that the HTTP-Task / EventBridge Connection so nicely spared me.

While it makes total sense to limit state I/O size, in this case the data only passes through state I/O because there is no other choice; StepFunctions is purely a control plane for this task.

If the HTTP-Task integration had an option to store the response body data to an S3 object, with a less strict size limit (a few MB?) instead of always returning the entire response, that would have been a perfect workaround.

Or perhaps there is a different workaround I can use?

asked a year ago384 views
1 Answer
0

You're correct that AWS Step Functions has a payload size limit, which is indeed 256 KB for both input and output data combined. This limit applies to tasks such as HTTP invocations, Lambda functions, and other integrated services.

For your scenario where the HTTP Task occasionally returns data exceeding this limit, there are a few potential workarounds:

  1. Use Amazon S3 for large payloads: This is the recommended approach for handling data that might exceed the 256 KB limit. Instead of passing large amounts of data directly through Step Functions, you can store the data in an S3 bucket and pass the S3 object key or ARN to your Step Functions execution. This way, the actual payload size doesn't affect the Step Functions execution.

  2. Optimize payload size: If possible, try to reduce the size of the data being returned by the API. This could involve filtering the data at the source or using compression techniques.

  3. Implement pagination: If the API supports it, you could implement pagination to retrieve the data in smaller chunks, processing each chunk separately.

  4. Use a Lambda function: As you mentioned, you could use a Lambda function to make the HTTP request and write the output directly to S3. While this does add code to your workflow, it gives you more control over handling large payloads.

Unfortunately, there isn't currently a built-in option in the HTTP Task to directly store the response body to S3 with a larger size limit, as you suggested. That would indeed be a useful feature.

Given your requirements and the desire to maintain a no-code scenario, the best approach might be to work with the API provider to see if they can offer more granular pagination or data filtering options. This could help ensure that the returned data stays within the Step Functions payload limit.

If that's not possible, using S3 as an intermediary storage solution, either through a Lambda function or by modifying the API to return S3 references instead of full data payloads, would be the most robust solution to handle varying payload sizes.
Sources
AWS StepFunctions [HTTP Task]: when RequestBody + ResponseBody >= 256kb does not go into Catch Task. | AWS re:Post
Provide a payload greater than 256kb for step function http task | AWS re:Post
Best practices for Step Functions - AWS Step Functions

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.