AWS-GLUE encoding utf-8

0

I have a problem when I submit a job in glue. My data in s3 contain ñ and accents, this causes that the job fails: unable to parse fail s3://...
If i edit my job with the python header:
# -- coding: utf-8 -- the jobs fail too.

Any idea? Thank you beforehand.

Edited by: anaP on Jan 24, 2018 4:10 AM

Edited by: anaP on Jan 24, 2018 4:10 AM

anaP
已提問 6 年前檢視次數 3411 次
7 個答案
0

Hi,

I am facing the same problem. Are you able to get a fix for this? I really do not want to use Spark DataFrame API at this point after spending so much time making the Glue data catalog perfect.

已回答 6 年前
0

Specify the encoding at the top of your script:

import sys
 
reload(sys)
sys.setdefaultencoding("utf-8")

I used this in my job and it resolved the error.

已回答 5 年前
0

I have discussed this with AWS technical support and there is no solution using DynamicFrames - you need to rewrite I'm afraid...

chriskl
已回答 5 年前
0

I've tried this and it doesn't work...

chriskl
已回答 5 年前
0

I have the same problem. Is this still in the works? Anybody found a working solution?

已回答 4 年前
0

Currently Glue DynamicFrame supports custom encoding in XML, but not in other formats like JSON or CSV.
If your data includes non-UTF characters, you can use DataFrame to read the data, write back to S3 with UTF8.

You can refer some samples in below repository.
https://github.com/aws-samples/aws-glue-samples/blob/master/examples/converting_char_encoding.md

AWS
已回答 4 年前
0

my glue job is getting failed to with an error unable to parse when trying to process ANSI formatted file. Any solution?

Adarsh
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南