Best Practice for streaming CDC from multiple tables

0

I'm using DMS to capture CDC from an RDS PostgreSQL Database, then writing the changes to a Kinesis Data Stream and finally using a Glue Streaming Job to process the data and write it to a Hudi Data Lake in S3. For multiple tables is the best practice to have 1 DMS task, 1 Kinesis Data Stream and 1 Glue Job for each table or is there a way to process multiple tables at a time? Thanks in advance!

2 réponses
0

Thanks for the answer. So, in terms of maintainability it would be best to have one for each, but for cost saving parallel tasks would be better, right?

répondu il y a 7 mois
  • correct, parallel tasks can be challenging if you don't have prior experience maintaining streams

0

Yes, in the code if you call the forEachLoop/await in a thread, you can start multiple streaming queries in the same cluster (Glue streaming job), for instance if using PySpark using a ThreadPool and tasks
This is complicate monitoring, tuning and operations in general but will save you cost significantly.

profile pictureAWS
EXPERT
répondu il y a 7 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions