partitions disappearing

0

I have several partitioned tables and have noticed that after a period of time queries return no data and I need to rerun MSCK REPAIR TABLE to make them visible again.

The process I have in place runs ADD PARTITIONS when new data is added to S3. This works for a period of time, but eventually I see queries return 0 rows and am forced to run the repair.

The data partition key "PARTITIONED BY (v_id string, x_id string, year int, month int, day int, hour int)". Some tables are CSV and others are AVRO, I've seen this on both.

Any suggestions? I could automate MSCK REPAIR, but that seems like the wrong solution.

Edited by: lettermuckoo on Dec 18, 2019 6:13 AM

feita há 4 anos245 visualizações
1 Resposta
0

I found the issue.

There was a job that was recreating the tables during deploys. MSCK REPAIR TABLE was being run after recreate, but it was not fully qualifying the database.tablename, so it was not discovering the existing partitions.

Edited by: lettermuckoo on Dec 18, 2019 1:56 PM

respondido há 4 anos

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas