SQS SendMessage Timeout or failing randomly

0

Hello. We have a serverless application for handling email messages.

I have a Node Lambda function triggered by SES that is sending a message to SQS with some information about the email in order to process it later with another lambda function. The application is already running as expected.

But we are receiveing some alerts randomly about timing out on SQS SendMessage (sqs.sendMessage()) sdk call or failing with "write EPIPE"

  • Connection timed out after 120000ms
  • "write EPIPE {<message>}"

The last message failed was a json string of about 10kb size. So should not be a problem. Also the lambda function is runnign on a VPC but its not having issues on 99% of emails received.

I have no clue why this is happending so any help is greatly appreciated.

Thanks

asked a year ago1959 views
1 Answer
2

What you're probably finding is that you're getting that message when the Lambda function sends a message to SQS successfully; but the next message is a few minutes later. Under the hood, the Lambda function will have been suspended (waiting for the next message) but the TCP session between the Lambda function the SQS has timed out. This happens when the Lambda service keeps the function "warm".

If your function was triggered more often then it's unlikely that you'd see this message because the Lambda suspension time would be less than the TCP session timeout.

I'm not sure what runtime language you're using but it's likely that you're initialising the SQS connection at function start - instead, consider doing that right before the SQS message is sent.

profile pictureAWS
EXPERT
answered a year ago
  • Thanks for replying and giving some light.

    My function is writen for NodeJS on TS. https://gist.github.com/rumeau/936b924c8a43c4116a12fcbea7034a92/revisions

    And though I have another function failing with similar errors, this one seems to be very simple.

    It consists of a SES action triggered function, wich sends the messageId and the message headers to SQS in order for a later function process the received message.

    I'm switching my function now in order to fix the async request to sqs and initialising the SQS connection right before sending the message to see if that fixes the issue.

    Thanks

  • Hello. Well, unfortunately the problem persists for the EPIPE exception.

    Of what I see is that the exception is thrown on the following Lambda invocation. I'm attaching a screenshot of the Cloudwatch log: https://pasteboard.co/1IaCsmRwzZI3.png

    As you see, the exception on 63ecb8bd-3cb4-4f85-ac88-0ddb9fd4c9b2 is thrown once 733241e3-d1ef-4ab9-bbe8-97972b3ac050 has started and actually, after 63ecb8bd-3cb4-4f85-ac88-0ddb9fd4c9b2 has already enqueued the message without errors apparently.

  • Be careful with async code in Lambda functions. Once the "main" code returns in Lambda, the function will be suspended. You need to ensure that you wait for all of your async threads to come back. I know that's vague (I'm not a Javascript expert) but I've seen it happen - that's probably why you're seeing the exceptions in the following invocation because that's when the function is "woken up" again and that's when the threads continue running.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions