- Newest
- Most votes
- Most comments
Hello. We do a lot of the same although not to unzip strictly speaking, but much of the same. At the time we implemented it, AWS Lambda compute resources were limited to 3BG in RAM and this was breaking our processing, so we moved to using AWS ECS + Fargate to do all the job. This was superbly easy as all I had to do was take the code the devs did, implement the few lines of logic that was required to wrap SQS (poll/change visibility/delete messages) that AWS Lambda did for us at the time.
We got our ECS service scale based on the depth of messages in the queue. The only real downside of that solution vs native lambda (well, if you forget the management of docker images in ECR instead of just the code in Lambda I suppose) is the 1 minute minimum it would take to go from 0 containers to N (where N is however many containers you want) given that AWS SQS metrics have 1 minute granularity (min). But our use-case is ETL so waiting a minute before crunching millions of records worth of files was not a significant issue.
With the newer feature in S3 notification, I would recommend you figure out whether Lambda or ECS is the right place for you to run the jobs based on the file attribute that you will see in the payload (i.e. file size) but I can't recommend more to use SQS to deal with keeping track of the processing jobs to do on these files vs SNS for which you have to implement yourself retry/replay.
Keep the code capable to run in one place or another (simply by invoking the code in ECS the same way you would in Lambda) and time will tell you which is best for your use-case.
For my devs who had written the ETL part of the code all I had to do was to re-use the code from this repo that deals with SQS and invoke their lambda_handler(event, context)
function by giving it the SQS message payload and that was all I had to do.
I think it's more efficient to save it to S3 directly. But I really would like to hear what are your pros for EFS before going to S3 back. I think even if the files are big and time consuming to extract it's not needed as long as you will have enough space in Lambda. Other solution for longer and more memory consuming proces is ECS Batch with Fargate launch type.
Relevant content
- asked 2 years ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 months ago
We are dealing with the zip file size of around 50 MB. The extracted folder may have approx 1000 pdf and xml files of around 30 kb each. As soon as the unzipped files are processed and moved to different S3 bucket, we need to delete the unzipped file in source S3 bucket. There will be 1000's of zip files we need to process daily.
I was looking at below consideration :
Let me know if this inputs helps.
I see, then if it's only temporary storage then EFS sounds better as you said. I thought that S3 used in Lambda will be the final one.