How can I fix this error SchemaColumnConvertNotSupportedException?

0

Hello everyone, I just started using Glue so forgive me if the question is stupid or I'm not providing the correct information to solve the problem. I've been facing this issue for the past two days and I cannot seem to solve it. I'm running a Glue Job where I read a table from the Glue catalog as a dynamic frame and then I turn it into a Spark dataframe to create some views and preprocess the data in the way I want. Everytime I try loading up my results on S3, converting the final dataframe from spark to Glue Dynamic Frame or even just trying a df.show() I get the error org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException. I tried to dissect the query to find the mistake, and I found out that even if I just load up the data from the Glue Catalog (S3 data passed by a Crawler), turn it into a Spark dataframe, create a temp view, run a simple query ('Select * from tempview') and try load this results on S3, I still get this error. If I go in the error logs I find an error like this:

org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file s3://bucket_name/folder_name/partition1=x/partition2=y/file.parquet. Column: [ColumnZ], Expected: string, Found: INT32

I really don't know how to fix this kind of error, also given that I get it just by performing simple operations as the ones described above. If someone could help me I would really appreciate it, I'm really desperate.

質問済み 6ヶ月前835ビュー
2回答
0

That means the schema of your files is inconsistent and that column is generalized as string but that is problematic in itself.
Assuming you can't fix the parquet files to be consistent (or the table is partitioned and files are consistent within each partition), you still might be able to workaround.
Looking at the error, I would say you are reading as DataFrame and not DynamicFrame, which is more flexible in these aspects.
Can you share the reading part of the code and the full stack trace?

profile pictureAWS
エキスパート
回答済み 6ヶ月前
0

Has this problem been resolved? Alternatively, what is the solution? Is it possible to rectify the data type using AWS Glue Spark? How do we manage situations where there are varying data types across multiple files, particularly in parquet format?

回答済み 4ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ