AWS Glue Schema mismatch

0

We are trying to use Glue to query and aggregate some Parquet files in S3.

We get this error related to schema mismatch:

An error occurred while calling o106.pyWriteDynamicFrame. org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException

We did notice that one of the columns in some of the parquet files was sometimes read as int or binary instead of string.

I tried changing the schema using both the Glue visual tool and a Spark job in Glue ETL. I even tried dropping that column. But no matter what I do I get that error and it feels like there is no way to change this in the Glue ETL

Any help with fixing this or getting the script to ignore the mismatch (since we don't even need the field) would really help! Thanks in advance!

  • The stacktrace is very important there to know at which point is failing

2개 답변
0

schemamatch data quality rule should capture it correct?

답변함 6달 전
  • not directly, data quality might notice the data doesn't look right looking at the rules or history, but if it's always been wrong, it might not detect it

0

I may suggest you reformat the data type of your columns in those parquest files process. Make sure they are in the same data type before you perform any further data manipulation.

profile picture
답변함 9달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인