New Partition Availability

0

Let's say that I am executing an INSERT INTO statement in Athena that is writing new partitions. When are those new partitions available to be queried in a SELECT query? Is it possible to run a SELECT statement while that INSERT INTO is running on the same Glue Catalog table and get partial data from the new partition - or will that partition become available only after it is fully written?

1 réponse
0

new partitions will be visible for SELECTS after the metadata about them is available which will happen either after

MSCK REPAIR TABLE

or (more lightweight and therefore preferred)

ALTER TABLE ... ADD PARTITION

You can however add those partitions "in advance" even before any data is added for those, and in this case the data will be available to SELECT queries as soon as some of the files are added to those partitions by INSERT INTO SELECT

https://docs.aws.amazon.com/athena/latest/ug/msck-repair-table.html

https://docs.aws.amazon.com/athena/latest/ug/alter-table-add-partition.html

AWS
Alex_T
répondu il y a 2 ans
  • If I don't ADD PARTITION in advance, and don't call MSCK REPAIR, is it still the case that "the data will be available to SELECT queries as soon as some of the files are added to those partitions by INSERT INTO SELECT"? That would essentially be the same as saying that in this scenario, the existence of a partition does not guarantee the corresponding INSERT INTO has finished writing the partition.

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions