partitions disappearing

0

I have several partitioned tables and have noticed that after a period of time queries return no data and I need to rerun MSCK REPAIR TABLE to make them visible again.

The process I have in place runs ADD PARTITIONS when new data is added to S3. This works for a period of time, but eventually I see queries return 0 rows and am forced to run the repair.

The data partition key "PARTITIONED BY (v_id string, x_id string, year int, month int, day int, hour int)". Some tables are CSV and others are AVRO, I've seen this on both.

Any suggestions? I could automate MSCK REPAIR, but that seems like the wrong solution.

Edited by: lettermuckoo on Dec 18, 2019 6:13 AM

preguntada hace 4 años245 visualizaciones
1 Respuesta
0

I found the issue.

There was a job that was recreating the tables during deploys. MSCK REPAIR TABLE was being run after recreate, but it was not fully qualifying the database.tablename, so it was not discovering the existing partitions.

Edited by: lettermuckoo on Dec 18, 2019 1:56 PM

respondido hace 4 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas