AWS Data Catalog table index not working

0

Hello,

For an AWS Data Catalog table, I ran Glue (structure: Amazon S3 -> Change Schema -> AWS Glue Data Catalog ) and populate table with only string records. All the actions were done from the Management Console: table, partition, index creation, ETL visual creation, ETL run.

After checking that the records exists in the table, I created a partition and an index. After this, all the table rows were deleted, and COUNT from Athena returned 0.

If I tried to create the partition and index before the upload of the data with Glue, no data was inserted.

Which are the rules for creating partitions and indexes in a Data Catalog table ?

Thank you,
Mihai

質問済み 4ヶ月前156ビュー
1回答
0
承認された回答

The index is just speeding up the partition lookup, it won't affect the data. The partition columns do affect how the data is structured on s3 and how is inserted.
Normally you define the partition columns/index and the tool writing into the table will register the partitions after inserting the data, if the partitions are not registered with specific values and paths then the data is not available for querying.

profile pictureAWS
エキスパート
回答済み 4ヶ月前
profile pictureAWS
エキスパート
レビュー済み 4ヶ月前
  • Thanks a lot,

    After deleting the partition and index, I can see the data in the table again.

    If I used as a tool to write into the table, Glue ETL, how should I register the partition and values/paths in the above configuration (should I use a specific component or should I be able to do this from the AWS Glue Data Catalog component) ?

    When you said speeding the lookup, does this means also to decrease the scanned volume of data (instead of scanning data, to scan the index) ?

    Thank you, Mihai

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ