DMS to S3 endpoint with CDC : should I use partitioning ?

0

I'm using DMS to replicate an RDS Aurora MySQL to S3 (FullLoad + CDC), for analytics purpose. I use the option "partitioned = true" on the target S3 endpoint, because I'm afraid the CDC generates too many objects per prefix. Unfortunately, the Glue Crawler seems completely lost when I activate partitioning, where as all works like a charm when I don't.

Is partitioning a bad idea in my situation ? (1 object per minute in a single prefix seems problematic anyway).

Should I rather activate the "GlueCatalogGeneration" option in the endpoint ?

lalvaro
posta 8 mesi fa501 visualizzazioni
1 Risposta
0
Risposta accettata

Don't worry about "too many prefixes on the same partition", nowaways s3 can handle that pretty well.
What is a big issue is generating too many partitions, the purpose of partitions is help the reading reduce the data read but if the partitions are too granular, then it's counterproductive. On most cases hour or even day are the best granularity (depending on how much data, history and how users will use the table)

profile pictureAWS
ESPERTO
con risposta 8 mesi fa
profile pictureAWS
ESPERTO
verificato 8 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande