ETL job failing with weird error

2

My etl job failing with below by checking the log, not sure what causing. Highly appreciate any advice

Language: python 3 Glue : 3

An error occurred while calling o93.parquet. java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary

Mark
preguntada hace 7 meses231 visualizaciones
1 Respuesta
3
Respuesta aceptada

Hello,

Seems like you are getting UnsupportedOperationException when reading the parquet data. There might be two cases as far as I aware. Either the underlying parquet file/files might be corrupted or the schema/datatype reference interpreted incorrectly. If you have partitioned data in s3 data source, try reading different data and see if you are getting the same issue when specifically reading particular partitioned data. If the files are not corrupted on the other hand, check if any column is in different type for an example, it may also throw this kind of exception. Refer this Jira - https://issues.apache.org/jira/browse/SPARK-24828

AWS
INGENIERO DE SOPORTE
respondido hace 7 meses
  • Thank you!!. I got the issue when querying particular partition, not sure though but I recreated that partition and the issue is resolved.

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas