Athena error HIVE_BAD_DATA: Not valid Parquet file

0

Hello,

I created a glue table with parquet serde initially.

Details:
input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
output format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
classification: parquet

Then I edited the table manually via console to use json serde.

Details:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.openx.data.jsonserde.JsonSerDe
classification: json

The underlying s3 data is in json format.

Now when I try to query the data via Athena, I get the below error.

HIVE_BAD_DATA: Not valid Parquet file

It seems Athena is not picking up the updated glue details? I tried MSCK repair on the table but it does not work.

Is there a way I can fix it without deleting the table?

Thanks.

demandé il y a un an1473 vues
1 réponse
0

Figured it.

If I drop the existing partitions and add it, Athena does not give error anymore.

I also tried to add more partitions after this, new partitions are taking json serde now.

répondu il y a un an

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions