“Parquet column cannot be converted in file, Pyspark Expected string Found: INT32.”

0

I encountered the following error, “Parquet column cannot be converted in file, Pyspark Expected string Found: INT32.” I tried to convert the column to INT32 (Applying withColumn(), but the error persisted. I tried add the statement, “spark.conf.set("spark.sql.parquet.enableVectorizedReader","false", but that did not help either. I wold appreciate very much your insights. Thanks

已提問 3 個月前檢視次數 729 次
1 個回答
0

That means the schema Spark has doesn't match the file, it can be due to reading via a catalog table that doesn't match the data, or having inconsistent parquet files in the same directory. If you do have mixed files, I would try to read with "mergeSchema"=true but not sure if it's going to solve it, you might need to tell the files apart and read them separately.

profile pictureAWS
專家
已回答 3 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南