Best Practice for streaming CDC from multiple tables

0

I'm using DMS to capture CDC from an RDS PostgreSQL Database, then writing the changes to a Kinesis Data Stream and finally using a Glue Streaming Job to process the data and write it to a Hudi Data Lake in S3. For multiple tables is the best practice to have 1 DMS task, 1 Kinesis Data Stream and 1 Glue Job for each table or is there a way to process multiple tables at a time? Thanks in advance!

2 Respuestas
0

Thanks for the answer. So, in terms of maintainability it would be best to have one for each, but for cost saving parallel tasks would be better, right?

respondido hace 7 meses
  • correct, parallel tasks can be challenging if you don't have prior experience maintaining streams

0

Yes, in the code if you call the forEachLoop/await in a thread, you can start multiple streaming queries in the same cluster (Glue streaming job), for instance if using PySpark using a ThreadPool and tasks
This is complicate monitoring, tuning and operations in general but will save you cost significantly.

profile pictureAWS
EXPERTO
respondido hace 7 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas