AWS-GLUE encoding utf-8

0

I have a problem when I submit a job in glue. My data in s3 contain ñ and accents, this causes that the job fails: unable to parse fail s3://...
If i edit my job with the python header:
# -- coding: utf-8 -- the jobs fail too.

Any idea? Thank you beforehand.

Edited by: anaP on Jan 24, 2018 4:10 AM

Edited by: anaP on Jan 24, 2018 4:10 AM

anaP
posta 6 anni fa3411 visualizzazioni
7 Risposte
0

Hi,

I am facing the same problem. Are you able to get a fix for this? I really do not want to use Spark DataFrame API at this point after spending so much time making the Glue data catalog perfect.

con risposta 6 anni fa
0

Specify the encoding at the top of your script:

import sys
 
reload(sys)
sys.setdefaultencoding("utf-8")

I used this in my job and it resolved the error.

con risposta 5 anni fa
0

I have discussed this with AWS technical support and there is no solution using DynamicFrames - you need to rewrite I'm afraid...

chriskl
con risposta 5 anni fa
0

I've tried this and it doesn't work...

chriskl
con risposta 5 anni fa
0

I have the same problem. Is this still in the works? Anybody found a working solution?

con risposta 4 anni fa
0

Currently Glue DynamicFrame supports custom encoding in XML, but not in other formats like JSON or CSV.
If your data includes non-UTF characters, you can use DataFrame to read the data, write back to S3 with UTF8.

You can refer some samples in below repository.
https://github.com/aws-samples/aws-glue-samples/blob/master/examples/converting_char_encoding.md

AWS
con risposta 4 anni fa
0

my glue job is getting failed to with an error unable to parse when trying to process ANSI formatted file. Any solution?

Adarsh
con risposta un anno fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande