Athena : HIVE_BAD_DATA

0

I got this error after loading a parquet file from S3 into Athena. HIVE_BAD_DATA: Field metadata_file's type BINARY in parquet file s3://dil-google-books/data/dataset.parquet/part-00108-33efb610-5e72-4e37-b646-3b4ab4578bad-c000.snappy.parquet is incompatible with type boolean defined in table schema

Could you please suggest what I need to do?

CREATE EXTERNAL TABLE IF NOT EXISTS all_dataset (
  AccessDate string,
  Authors string,
  Chapter string,
  Chron string,
  City string,
  Degree string,
  Edition string,
  Encyclopedia string,
  Format string,
  ID_list string,
  Issue string,
  Pages string,
  Periodical string,
  PublicationPlace string,
  PublisherName string,
  Series string,
  SeriesNumber string,
  Title string,
  TitleType string,
  URL string,
  Volume string,
  citations string,
  id int,
  page_title string,
  r_id int,
  r_parentid int,
  sections string,
  type_of_citation string,
  updated_identifier array <string> ,
  conf_score array <double>)
STORED AS PARQUET
LOCATION 's3:/'
tblproperties ("parquet.compress"="SNAPPY");
질문됨 2년 전3162회 조회
2개 답변
1

There are several versions of the HIVE_BAD_DATA error. One reason might be because The data type defined in the table definition doesn't match the actual source data and another reason might be a single field contains different types of data (for example, a boolean value for one record and a decimal value for another record).

In your case, you need to change the type in the schema to the appropriate data type (in your case it's binary).

I would suggest you format your data in AWS Glue (ETL Programming) then you can load into Athena Via Glue Data Catalog or directly from query.

Ref Links:

https://aws.amazon.com/premiumsupport/knowledge-center/athena-hive-bad-data-parsing-field-value/

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format.html

답변함 2년 전
AWS
전문가
검토됨 2년 전
0

As per my understanding you have some files where the column is typed as binary and some where it is typed as boolean. When you type the column of the table as boolean Athena will eventually read a file where the corresponding column is boolean and throw this error, and vice versa.The solution is to make sure your files all have the same schema.

profile picture
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인