Best Practice for streaming CDC from multiple tables

0

I'm using DMS to capture CDC from an RDS PostgreSQL Database, then writing the changes to a Kinesis Data Stream and finally using a Glue Streaming Job to process the data and write it to a Hudi Data Lake in S3. For multiple tables is the best practice to have 1 DMS task, 1 Kinesis Data Stream and 1 Glue Job for each table or is there a way to process multiple tables at a time? Thanks in advance!

2 回答
0

Thanks for the answer. So, in terms of maintainability it would be best to have one for each, but for cost saving parallel tasks would be better, right?

已回答 7 个月前
  • correct, parallel tasks can be challenging if you don't have prior experience maintaining streams

0

Yes, in the code if you call the forEachLoop/await in a thread, you can start multiple streaming queries in the same cluster (Glue streaming job), for instance if using PySpark using a ThreadPool and tasks
This is complicate monitoring, tuning and operations in general but will save you cost significantly.

profile pictureAWS
专家
已回答 7 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则