S3: bundle large number of small objects

0

Problem:

            Large number of uncompressed json objects in S3 bucket.
            Around 5 Million objects
            Size varies from few bytes to ~90MB, totals to 3TB.

Goal:

         Compress and Bundle the objects  into several bundles by prefix, move the bundle to another bucket, change storage class to glacier for the bundle.

Background:

        S3 Inventory is available.
        Objects are organized  as   s3://<bucket-name>/json/<datasource1>/<year>/<month>/<day>/       
                                                      s3://<bucket-name>/json/<datasource2>/<year>/<month>/<day>/
                                                      s3://<bucket-name>/json/<datasource3>/<year>/<month>/<day>/

I am puzzled on the best solution. Whether I should implement a solution based on https://github.com/amazon-archives/s3bundler on ec2 and run on demand or can s3 batch operation be used or any other means.

Thanks.

Gjoe
질문됨 4달 전165회 조회
1개 답변
0

Hello.

I don't think there is a problem with the way you build s3bundler on EC2.
Another method is to compress using AWS Glue using Python.
https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html

profile picture
전문가
답변함 4달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인