S3: bundle large number of small objects

0

Problem:

            Large number of uncompressed json objects in S3 bucket.
            Around 5 Million objects
            Size varies from few bytes to ~90MB, totals to 3TB.

Goal:

         Compress and Bundle the objects  into several bundles by prefix, move the bundle to another bucket, change storage class to glacier for the bundle.

Background:

        S3 Inventory is available.
        Objects are organized  as   s3://<bucket-name>/json/<datasource1>/<year>/<month>/<day>/       
                                                      s3://<bucket-name>/json/<datasource2>/<year>/<month>/<day>/
                                                      s3://<bucket-name>/json/<datasource3>/<year>/<month>/<day>/

I am puzzled on the best solution. Whether I should implement a solution based on https://github.com/amazon-archives/s3bundler on ec2 and run on demand or can s3 batch operation be used or any other means.

Thanks.

Gjoe
質問済み 4ヶ月前166ビュー
1回答
0

Hello.

I don't think there is a problem with the way you build s3bundler on EC2.
Another method is to compress using AWS Glue using Python.
https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html

profile picture
エキスパート
回答済み 4ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ