- Newest
- Most votes
- Most comments
Hello.
Of course I could create a new CSV file and overwrite the old one in S3 for 10000 times. But I guess that would incur unnecessary time and cost?
Since S3 charges based on the number of requests, this type of operation may result in high costs, depending on how many times this process is performed per day.
https://aws.amazon.com/s3/pricing/?nc1=h_ls
Another option is to temporarily store data using DynamoDB, but this may be more expensive than S3.
https://aws.amazon.com/dynamodb/pricing/?nc1=h_ls
Therefore, I think it would be a good idea to save it to S3 and process the CSV in S3 later using batch processing.
I could just create 10000 single-line CSV files in S3, download them to computer, and I patch them together myself
A variation of this would be for each invocation of your function to write a one-line CSV to a different area of the bucket, and then an S3 Event Notification that triggers another lambda function to append the contents of this new file to the "main" CSV.
An even better solution, although more complicated and possibly more expensive, is to put these S3 Events into a Kinesis Stream, and that Kinesis Stream triggers the lambda function that appends to the main CSV file (not my idea, credit to https://stackoverflow.com/a/42693053 )
Or depending on how frequently the original function is run, and how up-to-date the main CSV file must be kept, instead of S3 Event Notification you could use EventBridge Scheduler to run a function that does a sweep of all the one-line CSVs every minute (or whatever it needs to be) and then does a bulk append into the main CSV.
I would recommend to use Step Functions with a Distributed Map state. The Map state will iterate over the files (in S3) and will process each file using a Lambda function. The function will generate one line. You will then have a single Lambda, after the Map state, to collect all the results and create a single CSV file in S3.
Relevant content
- asked 2 years ago

You can also use Glue etc. to process saved CSV files. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-csv-home.html