Skip to content

Changing Schema Parquet

0

Facing the following error when using 2 jobs and 1 crawler:

Job 1: changing the schema and saving the csv file as parquet in S3. Job 2: ETL Process Crawler: Saving it in the AWs Glue Datacatalog Table.

HIVE_BAD_DATA: Field werk's type BINARY in parquet file s3://jd-artikel/refined-02/run-1716794918260-part-block-0-r-00006-uncompressed.parquet is incompatible with type integer defined in table schema This query ran against the ‘jd-artikel-new-01’ database, unless qualified by the query.

I do not understand

asked 2 years ago1K views
1 Answer
0

Hi Salman,

The error message indicates that the field named "werk" in the Parquet file has a data type of BINARY, but the corresponding column in the Glue catalog table schema is defined as an integer. The root cause of the issue is the mismatch between the Parquet file schema and the Glue table schema. Check the schema of the Parquet file generated by Job 1 to ensure that the "werk" field has the correct data type. If the Parquet file schema is correct, you can update the AWS Glue Data Catalog table schema to match the schema of the Parquet file. You can do this using the AWS Glue console or the AWS Glue API. For more guidance, you can refer Creating tables, updating the schema, and adding new partitions in the Data Catalog from AWS Glue ETL jobs.

AWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.