glue job fail for file encoding


Hi team,

I have a glue job that read from an S3 CSV file and inject it to DB,

I have the following error while running the job,

I think it's related to the file encoding,

the original file encoding is : ** ISO-8859-1**

if I change manually the file encoding to be UTF-8, the glue job passes.

do I need to have the CSV file encoded in utf-8 to be able to run successfully the job way? is there any way to go around this?

Thank you! Unable to parse file: myFile.csv\n\n\tat\n\tat\n\tat\n\tat\n\tat org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:247)\n\tat org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)\n\tat
Hi - You need to convert the character encoding from ISO-8859-1 to UTF-8 before letting AWS Glue process it.

Text-based data, such as CSVs, must be encoded in UTF-8 for AWS Glue to process it successfully.

There are few examples listed here - which use spark to convert the datatype.

