S3: bundle large number of small objects

0

Problem:

            Large number of uncompressed json objects in S3 bucket.
            Around 5 Million objects
            Size varies from few bytes to ~90MB, totals to 3TB.

Goal:

         Compress and Bundle the objects  into several bundles by prefix, move the bundle to another bucket, change storage class to glacier for the bundle.

Background:

        S3 Inventory is available.
        Objects are organized  as   s3://<bucket-name>/json/<datasource1>/<year>/<month>/<day>/       
                                                      s3://<bucket-name>/json/<datasource2>/<year>/<month>/<day>/
                                                      s3://<bucket-name>/json/<datasource3>/<year>/<month>/<day>/

I am puzzled on the best solution. Whether I should implement a solution based on https://github.com/amazon-archives/s3bundler on ec2 and run on demand or can s3 batch operation be used or any other means.

Thanks.

Gjoe
已提问 4 个月前166 查看次数
1 回答
0

Hello.

I don't think there is a problem with the way you build s3bundler on EC2.
Another method is to compress using AWS Glue using Python.
https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html

profile picture
专家
已回答 4 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则

相关内容