Best Practice for streaming CDC from multiple tables

0

I'm using DMS to capture CDC from an RDS PostgreSQL Database, then writing the changes to a Kinesis Data Stream and finally using a Glue Streaming Job to process the data and write it to a Hudi Data Lake in S3. For multiple tables is the best practice to have 1 DMS task, 1 Kinesis Data Stream and 1 Glue Job for each table or is there a way to process multiple tables at a time? Thanks in advance!

2回答
0

Thanks for the answer. So, in terms of maintainability it would be best to have one for each, but for cost saving parallel tasks would be better, right?

回答済み 7ヶ月前
  • correct, parallel tasks can be challenging if you don't have prior experience maintaining streams

0

Yes, in the code if you call the forEachLoop/await in a thread, you can start multiple streaming queries in the same cluster (Glue streaming job), for instance if using PySpark using a ThreadPool and tasks
This is complicate monitoring, tuning and operations in general but will save you cost significantly.

profile pictureAWS
エキスパート
回答済み 7ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ