“Parquet column cannot be converted in file, Pyspark Expected string Found: INT32.”

0

I encountered the following error, “Parquet column cannot be converted in file, Pyspark Expected string Found: INT32.” I tried to convert the column to INT32 (Applying withColumn(), but the error persisted. I tried add the statement, “spark.conf.set("spark.sql.parquet.enableVectorizedReader","false", but that did not help either. I wold appreciate very much your insights. Thanks

feita há 3 meses734 visualizações
1 Resposta
0

That means the schema Spark has doesn't match the file, it can be due to reading via a catalog table that doesn't match the data, or having inconsistent parquet files in the same directory. If you do have mixed files, I would try to read with "mergeSchema"=true but not sure if it's going to solve it, you might need to tell the files apart and read them separately.

profile pictureAWS
ESPECIALISTA
respondido há 3 meses

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas