- Neueste
- Die meisten Stimmen
- Die meisten Kommentare
Hi,
Thanks for reaching out. From the information provided, it does not look like there's any obvious issue with your setup. From what I can tell you have done the best practices in your setup such as creating the MongoClient outside of the handler connection. This should ensure that warm Lambda will reuse the Connection on subsequent requests.
You have mentioned that the Lambda is triggered by SNS with around ~1 million requests per 24/hours and getting ~100 errors on the same time frame which represents an error rate of .01%. As you may already know, Lambda is a highly distributed system and it is expected to encounter some intermittent or transient issues that are usually networking related in nature.
The issue does not necessarily have to be with the Lambda service itself. Each external request by the Lambda function can go through numerous components such as DNS server, load balancers, switches an the like. From the errors here, it seems that the Lambda function is trying to make a request to the Mongo Server which timed out. This request could have been dropped on any of the network components mentioned earlier. You can read more about this here in our documentation
The best practice/recommendation here is to simply retry the request. Since the Lambda is being triggered by SNS, we will not be able to trigger the client(SNS) to retry the actual request. Therefore, this must be handled inside the Lambda function itself. You can add code to retry the request in Lambda or you can handle it differently such as re-sending this message back to the SNS topic. It seems you already have a catch
statement here. May I ask if you have added any retry behavior for your code when these timeout issues occur?
Since the vast majority of your Lambda invocations are successful, retrying the request should lead to a successful execution. Please do let me know if you have any questions regarding this.
Relevanter Inhalt
- AWS OFFICIALAktualisiert vor 3 Jahren
- AWS OFFICIALAktualisiert vor einem Jahr
Hi Ryan,
Thanks for the detailed answer. Am I correct in then assuming that this would be considered a "normal" error in that there's nothing to be done outside of implementing redundancy in the Lambda?
In relation to this: "Since the Lambda is being triggered by SNS, we will not be able to trigger the client(SNS) to retry the actual request", I thought SNS automatically retries delivery 3 times? https://docs.aws.amazon.com/sns/latest/dg/sns-message-delivery-retries.html. Is that just for when the SNS is under YOUR control, vs when subscribing to an external queue?
As a second question, I implemented basic retrying along the lines of:
const client = createClient(); export const mongoClient = client.connect() .then(client => client) .catch(_ => { const client = createClient(); return client.connect(); }) .then(client => client) ... .catch(err => { throw err; // throw it to recatch later });
for a total of 3 attempts, though it still fails (3 separate timeouts). If a lambda/connection fails once, can we retry in that lambda, or is it considered "burned" and I need to retrigger a brand new lambda?
Hi,
Thanks for your reply. SNS will only follow that retry behavior if it is unable to reach the Lambda endpoint. SNS invokes Lambda asynchronously but if it is unable to do so then it will retry. In this case, SNS was able to reach the Lambda endpoint so from the point of view of SNS it has successfully passed the event to Lambda. It is mentioned here in the documentation ->
The delivery policy defines how Amazon SNS retries the delivery of messages when server-side errors occur (when the system that hosts the subscribed endpoint becomes unavailable)
That is strange that it fails with a timeout each time. You should be able to retry in the Lambda if the request. The various SDK's for example have built-in retry behavior. Are you recreating the connection that was created outside of the handler function?
Hi Ryan, Thanks for the precision with the SNS delivery behaviour. For the retry, yes, the code above is created in a separate file, then the result
awaited
in the handler itself. Is there an example (e.g. git) somewhere where I can compare? Or, if necessary, I can separate out the code in question and send it somewhere. As it stands, in the file where I create/callconnect()
, I catch any errors, then create a new client (createClient()
) and retry the connection. I do this 3 times before throwing, hence my question on if a lambda can be considered "burned" (it'll never work). Thanks