AWS Glue Schema mismatch

0

We are trying to use Glue to query and aggregate some Parquet files in S3.

We get this error related to schema mismatch:

An error occurred while calling o106.pyWriteDynamicFrame. org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException

We did notice that one of the columns in some of the parquet files was sometimes read as int or binary instead of string.

I tried changing the schema using both the Glue visual tool and a Spark job in Glue ETL. I even tried dropping that column. But no matter what I do I get that error and it feels like there is no way to change this in the Glue ETL

Any help with fixing this or getting the script to ignore the mismatch (since we don't even need the field) would really help! Thanks in advance!

  • The stacktrace is very important there to know at which point is failing

akshar
feita há 10 meses249 visualizações
2 Respostas
0

schemamatch data quality rule should capture it correct?

respondido há 6 meses
  • not directly, data quality might notice the data doesn't look right looking at the rules or history, but if it's always been wrong, it might not detect it

0

I may suggest you reformat the data type of your columns in those parquest files process. Make sure they are in the same data type before you perform any further data manipulation.

profile picture
respondido há 9 meses

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas