Data replication from RDS (pg) to S3 bucket in timeseries format

0

Dear Friends,

I have a need to copy/replicate of specific table from RDS (postgres) to another AWS account's S3 to build timeseries data for analytics or data lake perspective. I am looking for an option which is performant for million of records without performing pg export to csv.

Thanks in advance.

1 回答
0

Hey there!

To replicate a specific table from Amazon RDS (PostgreSQL) to another AWS account's S3 without exporting to CSV, you can use AWS Database Migration Service (DMS) in combination with AWS Glue. This approach is efficient for handling millions of records and supports continuous data replication.

  • DMS is ideal for scenarios where you need to replicate data from one database to another for analytics, backup, or operational purposes
  • DMS can handle time series data replication, including the replication of time-based data or time series data from a source database to a target database
  • You can make use of ongoing replication tasks in DMS -- such as Full load + CDC (migrates existing data and then updates the target database based on changes to the source database)

To do so, follow this high-level guideline:

  1. Set up AWS Database Migration Service (DMS) by configuring a replication instance and creating a replication endpoint for the source database.
  2. Create a replication task to define the replication process for the desired table. Configure a replication endpoint for the target database in the destination AWS account and set up a corresponding replication task for data replication.
  3. For exporting data to S3 and preparing it for analytics, utilise AWS Glue by creating a job to transform and save the table data in formats like Parquet, JSON, or ORC.
  4. Schedule the AWS Glue job to run at specified intervals to ensure the data in S3 remains up to date for analytics or data lake purposes

When replicating time series data using DMS, you'll configure the replication task to capture and replicate the time-based data from the source database to the target database. The key is to ensure that the appropriate configuration is set to capture and replicate the data accurately, considering the timestamps and structures associated with time series data.

Whether you're migrating time series data to a different type of database or replicating it within the same database type, DMS offers the necessary capabilities to handle the data replication efficiently. You can configure transformations and mappings to suit the requirements of the target database, enabling seamless migration and replication of your time series data.

Hope this answer helps!

AWS
已回答 7 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则