How can I fix this error SchemaColumnConvertNotSupportedException?

0

Hello everyone, I just started using Glue so forgive me if the question is stupid or I'm not providing the correct information to solve the problem. I've been facing this issue for the past two days and I cannot seem to solve it. I'm running a Glue Job where I read a table from the Glue catalog as a dynamic frame and then I turn it into a Spark dataframe to create some views and preprocess the data in the way I want. Everytime I try loading up my results on S3, converting the final dataframe from spark to Glue Dynamic Frame or even just trying a df.show() I get the error org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException. I tried to dissect the query to find the mistake, and I found out that even if I just load up the data from the Glue Catalog (S3 data passed by a Crawler), turn it into a Spark dataframe, create a temp view, run a simple query ('Select * from tempview') and try load this results on S3, I still get this error. If I go in the error logs I find an error like this:

org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file s3://bucket_name/folder_name/partition1=x/partition2=y/file.parquet. Column: [ColumnZ], Expected: string, Found: INT32

I really don't know how to fix this kind of error, also given that I get it just by performing simple operations as the ones described above. If someone could help me I would really appreciate it, I'm really desperate.

已提問 6 個月前檢視次數 835 次
2 個答案
0

That means the schema of your files is inconsistent and that column is generalized as string but that is problematic in itself.
Assuming you can't fix the parquet files to be consistent (or the table is partitioned and files are consistent within each partition), you still might be able to workaround.
Looking at the error, I would say you are reading as DataFrame and not DynamicFrame, which is more flexible in these aspects.
Can you share the reading part of the code and the full stack trace?

profile pictureAWS
專家
已回答 6 個月前
0

Has this problem been resolved? Alternatively, what is the solution? Is it possible to rectify the data type using AWS Glue Spark? How do we manage situations where there are varying data types across multiple files, particularly in parquet format?

已回答 4 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南