Error message while querying partition with parquet format


Query Id: c4e3e87f-45b1-4ad1-a0fa-fadf179d6cbd

HIVE_BAD_DATA: Not valid Parquet file: s3://sherlock-inventory-usamazon/spear-prod-usamazon-inventory/Refund/dt=2022-12-28/sea-events-batched-archival_2022-12-28_2022-12-29_c7dc1157-9b94-468e-be84-d171a7404981.parquet expected magic number: PAR1 got: ��u�

asked a year ago2863 views
1 Answer

Please let us know how this parquet file was generated. The main reason for these issues is the different ways parquet files can be created, and some of those are not compatible with Athena. Athena uses the Hive parquet SerDe ( As a result, the SerDe expects that all columns will be present in the source parquet file. The parquet format generated by some packages allow for the final parquet file to exclude columns if that column is blank in the data. For example, if a record does not have any value for the "x" column, then the "x" column is omitted from the actual parquet file itself.

When you try reading this file through Athena then it will attempt to read the metadata first and then the actual data. Here are a few suggestions for you to troubleshoot:

  • Try changing the Athena Engine version(Under Amazon Athena > Workgroups > Manual > V3 Engine).
  • Use S3 Select in the S3 console to see if the data is formatted correctly.
  • Download this file into a Linux/Mac console and use parquet-tools to confirm the file is in valid parquet format.
  • Check the SerDe defined in the Table DDL and ensure you are using the right SerDe.
  • Format your data in AWS Glue (ETL Programming) then write to parquet file or directly into Catalog table defined as parquet.

Ref Links:

If this helped, please accept answer or upvote for everyone's benefit.

profile pictureAWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions