Compression of parquet files in s3 bucket

0

I have a date pipeline, which takes data from rudderstack and puts it into S3 datalake. In a short time period, my 80k events are taking up 2.3GB.

Is there some way I can do compression on those objects? If possible, what I want to acheive is - data coming to S3 is compressed automatically to reduce storage charges.

asked 17 days ago30 views
1 Answer
1

Hello.

Since S3 alone does not have a compression feature, it would be a good idea to create a mechanism where a file is compressed using Lambda or Glue when it is uploaded to S3 and then uploaded again.
When using Lambda, it can be called from S3 with event notification.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventNotifications.html

In the case of AWS Glue, this is possible by using EventBridge etc. to execute the job.
https://repost.aws/questions/QUsQRxOITUR_qj2uAe3_NUoA/trigger-glue-job-from-s3

profile picture
EXPERT
answered 17 days ago
profile picture
EXPERT
reviewed 16 days ago
profile picture
EXPERT
reviewed 16 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions