- Newest
- Most votes
- Most comments
Hi,
My suggestion is to:
- Use S3 Replication to copy to the target bucket: it will scale very high with high resilience at no effort on your side (much less work than Lambda)
- Use a S3 Trigger to start a Lambda ( see https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html) when the object is replicated on target bucket: this lambda will "modify its folder structure and object key" as per your own word.
It will work fine to have trigger on target since S3 folders do not really exist and are just a mental model for users: see https://medium.com/@chamaln/why-s3-folders-dont-really-exist-35f29cf70477
Best,
Didier
To address the issue of Lambda throttling and scaling problems when dealing with high-frequency file uploads to S3, I recommend implementing an event-driven architecture using S3 -> SQS -> Lambda. This approach will help manage large traffic spikes and prevent maximum Lambda invocation errors. Here's why this solution is effective and how to implement it:
-
Use S3 event notifications to send messages to an SQS queue when new objects are uploaded.
-
Configure your Lambda function to use the SQS queue as its event source instead of directly triggering from S3 events.
-
Use a Standard SQS queue rather than a FIFO queue. Standard queues offer higher throughput and are better suited for this use case where exact ordering is not critical.
-
Implement batch processing in your Lambda function. Configure the Lambda to process multiple messages from the queue in a single invocation. This reduces the number of Lambda invocations and helps manage concurrency more effectively.
-
Set up a batch window for your Lambda function. AWS Lambda supports batch windows of up to 5 minutes for functions triggered by SQS. This allows you to accumulate multiple file events before processing them, which can help smooth out traffic spikes.
-
Adjust the visibility timeout and message retention period of your SQS queue to ensure messages are not lost if processing takes longer than expected.
-
Monitor the SQS queue depth and Lambda concurrency metrics in CloudWatch. Set up alarms to notify you if the queue starts to back up or if Lambda concurrency approaches its limit.
This approach offers several advantages:
- It decouples the event generation (S3 uploads) from event processing (Lambda function), allowing better handling of traffic spikes.
- SQS acts as a buffer, smoothing out traffic and preventing Lambda from being overwhelmed during high-volume periods.
- It allows for more efficient use of Lambda concurrency, as you can process multiple files in a single invocation.
- It's more resilient to failures, as unprocessed messages remain in the queue and can be retried.
Implementing this solution should not cause significant downtime, as you can gradually transition to the new architecture:
- Set up the SQS queue and configure S3 event notifications to send to both the existing Lambda function and the new SQS queue.
- Create a new version of your Lambda function that processes events from SQS instead of directly from S3.
- Test the new setup thoroughly in a non-production environment.
- Once confident, switch the production Lambda to use the SQS queue as its event source.
- Monitor closely and adjust configurations as needed.
This approach should significantly reduce the stress on your Lambda function and help manage large traffic spikes more effectively.
Sources
AWS Lambda functions now scale 12 times faster when handling high-volume requests | AWS News Blog
Multiple Uploads to S3 trigger the Lambda Multiple times | AWS re:Post
Lambda behavior | AWS re:Post
Relevant content
- asked 3 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 months ago
Hi, but the lambda function is doing some parsing for e.g. changing file names from hypen (-) to underscore (_) which is not possible using S3 replication as for that we need to write a logic code for which AWS Lambda is the suitable service.
Yes, that's exactly my point: 1/ you replicate to target bucket with original name via S3 replication 2/ when reaching destination via Replication, the file will be processed by the Lambda trigger where you can change hyphen to underscore, etc. My proposal boils to doing name transformation after replication rather than before as you currently do.