AWS Glue Schema mismatch

0

We are trying to use Glue to query and aggregate some Parquet files in S3.

We get this error related to schema mismatch:

An error occurred while calling o106.pyWriteDynamicFrame. org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException

We did notice that one of the columns in some of the parquet files was sometimes read as int or binary instead of string.

I tried changing the schema using both the Glue visual tool and a Spark job in Glue ETL. I even tried dropping that column. But no matter what I do I get that error and it feels like there is no way to change this in the Glue ETL

Any help with fixing this or getting the script to ignore the mismatch (since we don't even need the field) would really help! Thanks in advance!

  • The stacktrace is very important there to know at which point is failing

akshar
已提問 10 個月前檢視次數 249 次
2 個答案
0

schemamatch data quality rule should capture it correct?

已回答 6 個月前
  • not directly, data quality might notice the data doesn't look right looking at the rules or history, but if it's always been wrong, it might not detect it

0

I may suggest you reformat the data type of your columns in those parquest files process. Make sure they are in the same data type before you perform any further data manipulation.

profile picture
已回答 9 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南