“Parquet column cannot be converted in file, Pyspark Expected string Found: INT32.”

0

I encountered the following error, “Parquet column cannot be converted in file, Pyspark Expected string Found: INT32.” I tried to convert the column to INT32 (Applying withColumn(), but the error persisted. I tried add the statement, “spark.conf.set("spark.sql.parquet.enableVectorizedReader","false", but that did not help either. I wold appreciate very much your insights. Thanks

posta 3 mesi fa735 visualizzazioni
1 Risposta
0

That means the schema Spark has doesn't match the file, it can be due to reading via a catalog table that doesn't match the data, or having inconsistent parquet files in the same directory. If you do have mixed files, I would try to read with "mergeSchema"=true but not sure if it's going to solve it, you might need to tell the files apart and read them separately.

profile pictureAWS
ESPERTO
con risposta 3 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande