glue job fail for file encoding
Hi team,
I have a glue job that read from an S3 CSV file and inject it to DB,
I have the following error while running the job,
I think it's related to the file encoding,
the original file encoding is : ** ISO-8859-1**
if I change manually the file encoding to be UTF-8, the glue job passes.
do I need to have the CSV file encoded in utf-8 to be able to run successfully the job way? is there any way to go around this?
Thank you!
com.amazonaws.services.glue.util.FatalException: Unable to parse file: myFile.csv\n\n\tat com.amazonaws.services.glue.readers.JacksonReader.hasNextFailSafe(JacksonReader.scala:94)\n\tat com.amazonaws.services.glue.readers.JacksonReader.hasNext(JacksonReader.scala:38)\n\tat com.amazonaws.services.glue.readers.CSVReader.hasNext(CSVReader.scala:169)\n\tat com.amazonaws.services.glue.hadoop.TapeHadoopRecordReaderSplittable.nextKeyValue(TapeHadoopRecordReaderSplittable.scala:97)\n\tat org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:247)\n\tat org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)\n\tat
Hi - You need to convert the character encoding from ISO-8859-1 to UTF-8 before letting AWS Glue process it.
https://docs.aws.amazon.com/glue/latest/dg/components-key-concepts.html
Text-based data, such as CSVs, must be encoded in UTF-8 for AWS Glue to process it successfully.
There are few examples listed here -https://github.com/aws-samples/aws-glue-samples/blob/master/examples/converting_char_encoding.md which use spark to convert the datatype.
Relevant questions
backslash in CSV with glue
asked 6 months agoAWS glue combining multiple input into a single output csv
asked 5 months agoGlue job error : run ID: jr_f96799827354866ac2e798fb8b40d5781284e5ed5b3a4ff
asked a year agoGlue job s3 file not found exception
asked 5 years agoGlue job fail many workers
asked 3 months agoAWS Glue - Read a 'local' file in Python
asked 12 days agoAWS Glue retry a job after an execution error
Accepted Answerasked 2 months agoglue job fail for file encoding
asked 3 months agofail a glue job if the called stored procedure fails
asked 5 months agoHow do I get the output of an AWS Glue DataBrew job to be a single CSV file?
Accepted Answerasked 2 years ago