DMS to S3 endpoint with CDC : should I use partitioning ?

0

I'm using DMS to replicate an RDS Aurora MySQL to S3 (FullLoad + CDC), for analytics purpose. I use the option "partitioned = true" on the target S3 endpoint, because I'm afraid the CDC generates too many objects per prefix. Unfortunately, the Glue Crawler seems completely lost when I activate partitioning, where as all works like a charm when I don't.

Is partitioning a bad idea in my situation ? (1 object per minute in a single prefix seems problematic anyway).

Should I rather activate the "GlueCatalogGeneration" option in the endpoint ?

lalvaro
asked 7 months ago437 views
1 Answer
0
Accepted Answer

Don't worry about "too many prefixes on the same partition", nowaways s3 can handle that pretty well.
What is a big issue is generating too many partitions, the purpose of partitions is help the reading reduce the data read but if the partitions are too granular, then it's counterproductive. On most cases hour or even day are the best granularity (depending on how much data, history and how users will use the table)

profile pictureAWS
EXPERT
answered 7 months ago
profile pictureAWS
EXPERT
reviewed 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions