Data replication from RDS (pg) to S3 bucket in timeseries format

0

Dear Friends,

I have a need to copy/replicate of specific table from RDS (postgres) to another AWS account's S3 to build timeseries data for analytics or data lake perspective. I am looking for an option which is performant for million of records without performing pg export to csv.

Thanks in advance.

1 Answer
0

Hey there!

To replicate a specific table from Amazon RDS (PostgreSQL) to another AWS account's S3 without exporting to CSV, you can use AWS Database Migration Service (DMS) in combination with AWS Glue. This approach is efficient for handling millions of records and supports continuous data replication.

  • DMS is ideal for scenarios where you need to replicate data from one database to another for analytics, backup, or operational purposes
  • DMS can handle time series data replication, including the replication of time-based data or time series data from a source database to a target database
  • You can make use of ongoing replication tasks in DMS -- such as Full load + CDC (migrates existing data and then updates the target database based on changes to the source database)

To do so, follow this high-level guideline:

  1. Set up AWS Database Migration Service (DMS) by configuring a replication instance and creating a replication endpoint for the source database.
  2. Create a replication task to define the replication process for the desired table. Configure a replication endpoint for the target database in the destination AWS account and set up a corresponding replication task for data replication.
  3. For exporting data to S3 and preparing it for analytics, utilise AWS Glue by creating a job to transform and save the table data in formats like Parquet, JSON, or ORC.
  4. Schedule the AWS Glue job to run at specified intervals to ensure the data in S3 remains up to date for analytics or data lake purposes

When replicating time series data using DMS, you'll configure the replication task to capture and replicate the time-based data from the source database to the target database. The key is to ensure that the appropriate configuration is set to capture and replicate the data accurately, considering the timestamps and structures associated with time series data.

Whether you're migrating time series data to a different type of database or replicating it within the same database type, DMS offers the necessary capabilities to handle the data replication efficiently. You can configure transformations and mappings to suit the requirements of the target database, enabling seamless migration and replication of your time series data.

Hope this answer helps!

AWS
answered 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions