How can I fix this error SchemaColumnConvertNotSupportedException?

0

Hello everyone, I just started using Glue so forgive me if the question is stupid or I'm not providing the correct information to solve the problem. I've been facing this issue for the past two days and I cannot seem to solve it. I'm running a Glue Job where I read a table from the Glue catalog as a dynamic frame and then I turn it into a Spark dataframe to create some views and preprocess the data in the way I want. Everytime I try loading up my results on S3, converting the final dataframe from spark to Glue Dynamic Frame or even just trying a df.show() I get the error org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException. I tried to dissect the query to find the mistake, and I found out that even if I just load up the data from the Glue Catalog (S3 data passed by a Crawler), turn it into a Spark dataframe, create a temp view, run a simple query ('Select * from tempview') and try load this results on S3, I still get this error. If I go in the error logs I find an error like this:

org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file s3://bucket_name/folder_name/partition1=x/partition2=y/file.parquet. Column: [ColumnZ], Expected: string, Found: INT32

I really don't know how to fix this kind of error, also given that I get it just by performing simple operations as the ones described above. If someone could help me I would really appreciate it, I'm really desperate.

질문됨 6달 전835회 조회
2개 답변
0

That means the schema of your files is inconsistent and that column is generalized as string but that is problematic in itself.
Assuming you can't fix the parquet files to be consistent (or the table is partitioned and files are consistent within each partition), you still might be able to workaround.
Looking at the error, I would say you are reading as DataFrame and not DynamicFrame, which is more flexible in these aspects.
Can you share the reading part of the code and the full stack trace?

profile pictureAWS
전문가
답변함 6달 전
0

Has this problem been resolved? Alternatively, what is the solution? Is it possible to rectify the data type using AWS Glue Spark? How do we manage situations where there are varying data types across multiple files, particularly in parquet format?

답변함 4달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠