ETL job failing with weird error

2

My etl job failing with below by checking the log, not sure what causing. Highly appreciate any advice

Language: python 3 Glue : 3

An error occurred while calling o93.parquet. java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary

Mark
asked 6 months ago209 views
1 Answer
3
Accepted Answer

Hello,

Seems like you are getting UnsupportedOperationException when reading the parquet data. There might be two cases as far as I aware. Either the underlying parquet file/files might be corrupted or the schema/datatype reference interpreted incorrectly. If you have partitioned data in s3 data source, try reading different data and see if you are getting the same issue when specifically reading particular partitioned data. If the files are not corrupted on the other hand, check if any column is in different type for an example, it may also throw this kind of exception. Refer this Jira - https://issues.apache.org/jira/browse/SPARK-24828

AWS
SUPPORT ENGINEER
answered 6 months ago
  • Thank you!!. I got the issue when querying particular partition, not sure though but I recreated that partition and the issue is resolved.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions

Relevant content