AWS-GLUE encoding utf-8

0

I have a problem when I submit a job in glue. My data in s3 contain ñ and accents, this causes that the job fails: unable to parse fail s3://...
If i edit my job with the python header:
# -- coding: utf-8 -- the jobs fail too.

Any idea? Thank you beforehand.

Edited by: anaP on Jan 24, 2018 4:10 AM

Edited by: anaP on Jan 24, 2018 4:10 AM

anaP
asked 6 years ago3336 views
7 Answers
0

Hi,

I am facing the same problem. Are you able to get a fix for this? I really do not want to use Spark DataFrame API at this point after spending so much time making the Glue data catalog perfect.

answered 6 years ago
0

Specify the encoding at the top of your script:

import sys
 
reload(sys)
sys.setdefaultencoding("utf-8")

I used this in my job and it resolved the error.

answered 5 years ago
0

I have discussed this with AWS technical support and there is no solution using DynamicFrames - you need to rewrite I'm afraid...

chriskl
answered 5 years ago
0

I've tried this and it doesn't work...

chriskl
answered 5 years ago
0

I have the same problem. Is this still in the works? Anybody found a working solution?

answered 4 years ago
0

Currently Glue DynamicFrame supports custom encoding in XML, but not in other formats like JSON or CSV.
If your data includes non-UTF characters, you can use DataFrame to read the data, write back to S3 with UTF8.

You can refer some samples in below repository.
https://github.com/aws-samples/aws-glue-samples/blob/master/examples/converting_char_encoding.md

AWS
answered 4 years ago
0

my glue job is getting failed to with an error unable to parse when trying to process ANSI formatted file. Any solution?

Adarsh
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions