Best Practice for streaming CDC from multiple tables

0

I'm using DMS to capture CDC from an RDS PostgreSQL Database, then writing the changes to a Kinesis Data Stream and finally using a Glue Streaming Job to process the data and write it to a Hudi Data Lake in S3. For multiple tables is the best practice to have 1 DMS task, 1 Kinesis Data Stream and 1 Glue Job for each table or is there a way to process multiple tables at a time? Thanks in advance!

2 Answers
0

Thanks for the answer. So, in terms of maintainability it would be best to have one for each, but for cost saving parallel tasks would be better, right?

answered 7 months ago
  • correct, parallel tasks can be challenging if you don't have prior experience maintaining streams

0

Yes, in the code if you call the forEachLoop/await in a thread, you can start multiple streaming queries in the same cluster (Glue streaming job), for instance if using PySpark using a ThreadPool and tasks
This is complicate monitoring, tuning and operations in general but will save you cost significantly.

profile pictureAWS
EXPERT
answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions