By using AWS re:Post, you agree to the Terms of Use

RDS to S3 to RDS or Redshift near realtime

0

IHAC who uses a set of RDS (250 DBs, one per client) instances for OLTP, and another set with transformed data for analytics.

All there clients have to have their data segregated (about 250 clients right now, but growing).

They are attempting to migrate with DMS but having problems. RDS to RDS is really slow for them and the CDC doesn't work well, so they switch to RDS to S3, and Glue to transform and put into analytic servers.

Their current solution is simple and fast for them, but then they faced the problem of the endpoint limit for DMS.

They only have about 350 GB of data, but I still think they might be better off with Redshift or Redshift Spectrum for OLAP.

There is still the issue of the most effective way to migrate and transform the data.

They may have had problems with DMS CDC due to incorrect usage.

However, there is also RDS Snapshot export which could be used with Redshift Spectrum. Would this be effective with incrementals? https://aws.amazon.com/about-aws/whats-new/2020/01/announcing-amazon-relational-database-service-snapshot-export-to-s3/

There are going to be hundreds of databases because of client segregation. Each would have the same table names, so would need a separate directory in S3.

Any suggestions on approach?

1 Answers
0
Accepted Answer

Depends on how your customer defines near-real-time you have few options but there are various costs and consideration with all that :

  1. WAL replication to kinesis : https://aws.amazon.com/blogs/database/stream-changes-from-amazon-rds-for-postgresql-using-amazon-kinesis-data-streams-and-aws-lambda/. This will take up resources from your primary but you can reduce that by setting up replica.
  2. Scheduled Jobs that does S3 exports of changed data. This is the least expensive as you can decide what tables to export data from and how often. You can decide to do that every 10 mins or so. AWS Batch can be used with Lambda to accomplish this.
  3. Upgrade to Aurora and utilize database streams. This way you have minimal impact on writer and you can just convert SQL statements to data files using lambda.
  4. Use DMS with RDS as source and S3 as target. Like #1 i would recommend you set it up on replica.
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions