New Partition Availability

0

Let's say that I am executing an INSERT INTO statement in Athena that is writing new partitions. When are those new partitions available to be queried in a SELECT query? Is it possible to run a SELECT statement while that INSERT INTO is running on the same Glue Catalog table and get partial data from the new partition - or will that partition become available only after it is fully written?

preguntada hace 2 años237 visualizaciones
1 Respuesta
0

new partitions will be visible for SELECTS after the metadata about them is available which will happen either after

MSCK REPAIR TABLE

or (more lightweight and therefore preferred)

ALTER TABLE ... ADD PARTITION

You can however add those partitions "in advance" even before any data is added for those, and in this case the data will be available to SELECT queries as soon as some of the files are added to those partitions by INSERT INTO SELECT

https://docs.aws.amazon.com/athena/latest/ug/msck-repair-table.html

https://docs.aws.amazon.com/athena/latest/ug/alter-table-add-partition.html

AWS
Alex_T
respondido hace 2 años
  • If I don't ADD PARTITION in advance, and don't call MSCK REPAIR, is it still the case that "the data will be available to SELECT queries as soon as some of the files are added to those partitions by INSERT INTO SELECT"? That would essentially be the same as saying that in this scenario, the existence of a partition does not guarantee the corresponding INSERT INTO has finished writing the partition.

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas