Problem:
Large number of uncompressed json objects in S3 bucket.
Around 5 Million objects
Size varies from few bytes to ~90MB, totals to 3TB.
Goal:
Compress and Bundle the objects into several bundles by prefix, move the bundle to another bucket, change storage class to glacier for the bundle.
Background:
S3 Inventory is available.
Objects are organized as s3://<bucket-name>/json/<datasource1>/<year>/<month>/<day>/
s3://<bucket-name>/json/<datasource2>/<year>/<month>/<day>/
s3://<bucket-name>/json/<datasource3>/<year>/<month>/<day>/
I am puzzled on the best solution. Whether I should implement a solution based on https://github.com/amazon-archives/s3bundler on ec2 and run on demand or can s3 batch operation be used or any other means.
Thanks.