AWS Glue Schema mismatch

0

We are trying to use Glue to query and aggregate some Parquet files in S3.

We get this error related to schema mismatch:

An error occurred while calling o106.pyWriteDynamicFrame. org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException

We did notice that one of the columns in some of the parquet files was sometimes read as int or binary instead of string.

I tried changing the schema using both the Glue visual tool and a Spark job in Glue ETL. I even tried dropping that column. But no matter what I do I get that error and it feels like there is no way to change this in the Glue ETL

Any help with fixing this or getting the script to ignore the mismatch (since we don't even need the field) would really help! Thanks in advance!

  • The stacktrace is very important there to know at which point is failing

2 réponses
0

schemamatch data quality rule should capture it correct?

répondu il y a 6 mois
  • not directly, data quality might notice the data doesn't look right looking at the rules or history, but if it's always been wrong, it might not detect it

0

I may suggest you reformat the data type of your columns in those parquest files process. Make sure they are in the same data type before you perform any further data manipulation.

profile picture
répondu il y a 9 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions