AWS-GLUE encoding utf-8

0

I have a problem when I submit a job in glue. My data in s3 contain ñ and accents, this causes that the job fails: unable to parse fail s3://...
If i edit my job with the python header:
# -- coding: utf-8 -- the jobs fail too.

Any idea? Thank you beforehand.

Edited by: anaP on Jan 24, 2018 4:10 AM

Edited by: anaP on Jan 24, 2018 4:10 AM

anaP
demandé il y a 6 ans3411 vues
7 réponses
0

Hi,

I am facing the same problem. Are you able to get a fix for this? I really do not want to use Spark DataFrame API at this point after spending so much time making the Glue data catalog perfect.

répondu il y a 6 ans
0

Specify the encoding at the top of your script:

import sys
 
reload(sys)
sys.setdefaultencoding("utf-8")

I used this in my job and it resolved the error.

répondu il y a 5 ans
0

I have discussed this with AWS technical support and there is no solution using DynamicFrames - you need to rewrite I'm afraid...

chriskl
répondu il y a 5 ans
0

I've tried this and it doesn't work...

chriskl
répondu il y a 5 ans
0

I have the same problem. Is this still in the works? Anybody found a working solution?

répondu il y a 4 ans
0

Currently Glue DynamicFrame supports custom encoding in XML, but not in other formats like JSON or CSV.
If your data includes non-UTF characters, you can use DataFrame to read the data, write back to S3 with UTF8.

You can refer some samples in below repository.
https://github.com/aws-samples/aws-glue-samples/blob/master/examples/converting_char_encoding.md

AWS
répondu il y a 4 ans
0

my glue job is getting failed to with an error unable to parse when trying to process ANSI formatted file. Any solution?

Adarsh
répondu il y a un an

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions