How to upload a parquet format file of RDS table data to S3 without using snapshots

0

I'm trying to send a copy of an AWS RDS table data to an external cloud storage service to do some analysis. (The reason I'm using the storage service is because I want to use it as a data lake.) I prefer Parquet to CSV as the file format of the copied data . (Parquet is preferred over CSV because it is easier to transform.) I think snapshot of RDS is the best solution, but snapshot can't specify table columns, and it outputs columns that should be masked as they are, so there is a security concern.

So, I would like to know if there is a way to output RDS table data to S3 as a Parquet file without using snapshot. Thank you very much for your help.

  • Yasshi-Cookie, let us know if this answers your question. If this solved your issue, please remember to click on the "Accept" button to let the community know that your question is resolved. This helps everyone. Thank you in advance. Thank you for using re:Post.

asked 2 years ago3482 views
2 Answers
0

Please take a look at this https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.SaveIntoS3.html, You can selectively send data to an s3 file, However this is text/csv file, You can have a lambda that will trigger when a file is generated and that will convert your text/csv file to parquet format.

AWS
answered 2 years ago
  • Thank you for your answer. I think it is a practical way. However, it is a pity that there is no way to export directly in Parquet format.

0

In addition to the Save to S3 option, you can also write a Glue job to query RDS data and store in S3 in Parquet format. Some helpful information here.

AWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions