S3: bundle large number of small objects

0

Problem:

            Large number of uncompressed json objects in S3 bucket.
            Around 5 Million objects
            Size varies from few bytes to ~90MB, totals to 3TB.

Goal:

         Compress and Bundle the objects  into several bundles by prefix, move the bundle to another bucket, change storage class to glacier for the bundle.

Background:

        S3 Inventory is available.
        Objects are organized  as   s3://<bucket-name>/json/<datasource1>/<year>/<month>/<day>/       
                                                      s3://<bucket-name>/json/<datasource2>/<year>/<month>/<day>/
                                                      s3://<bucket-name>/json/<datasource3>/<year>/<month>/<day>/

I am puzzled on the best solution. Whether I should implement a solution based on https://github.com/amazon-archives/s3bundler on ec2 and run on demand or can s3 batch operation be used or any other means.

Thanks.

Gjoe
asked 4 months ago153 views
1 Answer
0

Hello.

I don't think there is a problem with the way you build s3bundler on EC2.
Another method is to compress using AWS Glue using Python.
https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html

profile picture
EXPERT
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions