partitions disappearing

0

I have several partitioned tables and have noticed that after a period of time queries return no data and I need to rerun MSCK REPAIR TABLE to make them visible again.

The process I have in place runs ADD PARTITIONS when new data is added to S3. This works for a period of time, but eventually I see queries return 0 rows and am forced to run the repair.

The data partition key "PARTITIONED BY (v_id string, x_id string, year int, month int, day int, hour int)". Some tables are CSV and others are AVRO, I've seen this on both.

Any suggestions? I could automate MSCK REPAIR, but that seems like the wrong solution.

Edited by: lettermuckoo on Dec 18, 2019 6:13 AM

posta 4 anni fa245 visualizzazioni
1 Risposta
0

I found the issue.

There was a job that was recreating the tables during deploys. MSCK REPAIR TABLE was being run after recreate, but it was not fully qualifying the database.tablename, so it was not discovering the existing partitions.

Edited by: lettermuckoo on Dec 18, 2019 1:56 PM

con risposta 4 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande