AWS Glue Schema mismatch

0

We are trying to use Glue to query and aggregate some Parquet files in S3.

We get this error related to schema mismatch:

An error occurred while calling o106.pyWriteDynamicFrame. org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException

We did notice that one of the columns in some of the parquet files was sometimes read as int or binary instead of string.

I tried changing the schema using both the Glue visual tool and a Spark job in Glue ETL. I even tried dropping that column. But no matter what I do I get that error and it feels like there is no way to change this in the Glue ETL

Any help with fixing this or getting the script to ignore the mismatch (since we don't even need the field) would really help! Thanks in advance!

  • The stacktrace is very important there to know at which point is failing

akshar
preguntada hace 10 meses249 visualizaciones
2 Respuestas
0

schemamatch data quality rule should capture it correct?

respondido hace 6 meses
  • not directly, data quality might notice the data doesn't look right looking at the rules or history, but if it's always been wrong, it might not detect it

0

I may suggest you reformat the data type of your columns in those parquest files process. Make sure they are in the same data type before you perform any further data manipulation.

profile picture
respondido hace 9 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas