New Partition Availability

0

Let's say that I am executing an INSERT INTO statement in Athena that is writing new partitions. When are those new partitions available to be queried in a SELECT query? Is it possible to run a SELECT statement while that INSERT INTO is running on the same Glue Catalog table and get partial data from the new partition - or will that partition become available only after it is fully written?

已提问 2 年前237 查看次数
1 回答
0

new partitions will be visible for SELECTS after the metadata about them is available which will happen either after

MSCK REPAIR TABLE

or (more lightweight and therefore preferred)

ALTER TABLE ... ADD PARTITION

You can however add those partitions "in advance" even before any data is added for those, and in this case the data will be available to SELECT queries as soon as some of the files are added to those partitions by INSERT INTO SELECT

https://docs.aws.amazon.com/athena/latest/ug/msck-repair-table.html

https://docs.aws.amazon.com/athena/latest/ug/alter-table-add-partition.html

AWS
Alex_T
已回答 2 年前
  • If I don't ADD PARTITION in advance, and don't call MSCK REPAIR, is it still the case that "the data will be available to SELECT queries as soon as some of the files are added to those partitions by INSERT INTO SELECT"? That would essentially be the same as saying that in this scenario, the existence of a partition does not guarantee the corresponding INSERT INTO has finished writing the partition.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则