DMS to S3 endpoint with CDC : should I use partitioning ?

0

I'm using DMS to replicate an RDS Aurora MySQL to S3 (FullLoad + CDC), for analytics purpose. I use the option "partitioned = true" on the target S3 endpoint, because I'm afraid the CDC generates too many objects per prefix. Unfortunately, the Glue Crawler seems completely lost when I activate partitioning, where as all works like a charm when I don't.

Is partitioning a bad idea in my situation ? (1 object per minute in a single prefix seems problematic anyway).

Should I rather activate the "GlueCatalogGeneration" option in the endpoint ?

lalvaro
質問済み 8ヶ月前471ビュー
1回答
0
承認された回答

Don't worry about "too many prefixes on the same partition", nowaways s3 can handle that pretty well.
What is a big issue is generating too many partitions, the purpose of partitions is help the reading reduce the data read but if the partitions are too granular, then it's counterproductive. On most cases hour or even day are the best granularity (depending on how much data, history and how users will use the table)

profile pictureAWS
エキスパート
回答済み 8ヶ月前
profile pictureAWS
エキスパート
レビュー済み 8ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ