ETL job failing with weird error

2

My etl job failing with below by checking the log, not sure what causing. Highly appreciate any advice

Language: python 3 Glue : 3

An error occurred while calling o93.parquet. java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary

Mark
demandé il y a 7 mois231 vues
1 réponse
3
Réponse acceptée

Hello,

Seems like you are getting UnsupportedOperationException when reading the parquet data. There might be two cases as far as I aware. Either the underlying parquet file/files might be corrupted or the schema/datatype reference interpreted incorrectly. If you have partitioned data in s3 data source, try reading different data and see if you are getting the same issue when specifically reading particular partitioned data. If the files are not corrupted on the other hand, check if any column is in different type for an example, it may also throw this kind of exception. Refer this Jira - https://issues.apache.org/jira/browse/SPARK-24828

AWS
INGÉNIEUR EN ASSISTANCE TECHNIQUE
répondu il y a 7 mois
  • Thank you!!. I got the issue when querying particular partition, not sure though but I recreated that partition and the issue is resolved.

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions