DMS to S3 endpoint with CDC : should I use partitioning ?

0

I'm using DMS to replicate an RDS Aurora MySQL to S3 (FullLoad + CDC), for analytics purpose. I use the option "partitioned = true" on the target S3 endpoint, because I'm afraid the CDC generates too many objects per prefix. Unfortunately, the Glue Crawler seems completely lost when I activate partitioning, where as all works like a charm when I don't.

Is partitioning a bad idea in my situation ? (1 object per minute in a single prefix seems problematic anyway).

Should I rather activate the "GlueCatalogGeneration" option in the endpoint ?

lalvaro
demandé il y a 8 mois498 vues
1 réponse
0
Réponse acceptée

Don't worry about "too many prefixes on the same partition", nowaways s3 can handle that pretty well.
What is a big issue is generating too many partitions, the purpose of partitions is help the reading reduce the data read but if the partitions are too granular, then it's counterproductive. On most cases hour or even day are the best granularity (depending on how much data, history and how users will use the table)

profile pictureAWS
EXPERT
répondu il y a 8 mois
profile pictureAWS
EXPERT
vérifié il y a 8 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions