Athena error HIVE_BAD_DATA: Not valid Parquet file

0

Hello,

I created a glue table with parquet serde initially.

Details:
input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
output format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
classification: parquet

Then I edited the table manually via console to use json serde.

Details:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.openx.data.jsonserde.JsonSerDe
classification: json

The underlying s3 data is in json format.

Now when I try to query the data via Athena, I get the below error.

HIVE_BAD_DATA: Not valid Parquet file

It seems Athena is not picking up the updated glue details? I tried MSCK repair on the table but it does not work.

Is there a way I can fix it without deleting the table?

Thanks.

gefragt vor einem Jahr1472 Aufrufe
1 Antwort
0

Figured it.

If I drop the existing partitions and add it, Athena does not give error anymore.

I also tried to add more partitions after this, new partitions are taking json serde now.

beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen