DMS to S3 endpoint with CDC : should I use partitioning ?


I'm using DMS to replicate an RDS Aurora MySQL to S3 (FullLoad + CDC), for analytics purpose. I use the option "partitioned = true" on the target S3 endpoint, because I'm afraid the CDC generates too many objects per prefix. Unfortunately, the Glue Crawler seems completely lost when I activate partitioning, where as all works like a charm when I don't.

Is partitioning a bad idea in my situation ? (1 object per minute in a single prefix seems problematic anyway).

Should I rather activate the "GlueCatalogGeneration" option in the endpoint ?

1개 답변
수락된 답변

Don't worry about "too many prefixes on the same partition", nowaways s3 can handle that pretty well.
What is a big issue is generating too many partitions, the purpose of partitions is help the reading reduce the data read but if the partitions are too granular, then it's counterproductive. On most cases hour or even day are the best granularity (depending on how much data, history and how users will use the table)

profile pictureAWS
답변함 8달 전
profile pictureAWS
검토됨 8달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠