S3: bundle large number of small objects

0

Problem:

            Large number of uncompressed json objects in S3 bucket.
            Around 5 Million objects
            Size varies from few bytes to ~90MB, totals to 3TB.

Goal:

         Compress and Bundle the objects  into several bundles by prefix, move the bundle to another bucket, change storage class to glacier for the bundle.

Background:

        S3 Inventory is available.
        Objects are organized  as   s3://<bucket-name>/json/<datasource1>/<year>/<month>/<day>/       
                                                      s3://<bucket-name>/json/<datasource2>/<year>/<month>/<day>/
                                                      s3://<bucket-name>/json/<datasource3>/<year>/<month>/<day>/

I am puzzled on the best solution. Whether I should implement a solution based on https://github.com/amazon-archives/s3bundler on ec2 and run on demand or can s3 batch operation be used or any other means.

Thanks.

Gjoe
preguntada hace 4 meses166 visualizaciones
1 Respuesta
0

Hello.

I don't think there is a problem with the way you build s3bundler on EC2.
Another method is to compress using AWS Glue using Python.
https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html

profile picture
EXPERTO
respondido hace 4 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas