2 Answers
- Newest
- Most votes
- Most comments
2
Hi,
b'\x96' is not a valid utf-8 encoded character. Hence the error message as you specified that your file is utf-8 encoded
b'\x96' is dash ('-') in latin1: so, you may want to say to comprehend that you file is latin1 instead of utf-8.
Best,
Didier
0
Thanks for the quick response, awesome! Are there any formatting guidelines for CSV that we can follow like removing these symbols?
answered 20 days ago
Hi Brendan, thanks for accepting my answer! Instead of removing chars, you may want to convert your file(s) for latin1 to utf-8: see https://milosophical.me/blog/2018/latin1-to-utf8.html
thanks for the resource. unfortunately I'm not a programmer :> i tried in EXCEL to save as a CSV UTF8 but didn't work
Relevant content
- asked a year ago
- asked 9 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated a year ago
the required format for AWS comprehend is CSV UTF-8 I tried to (1) remove all '-' , but still get same error message I tried to save as a UTF-8 file but causes some corruption of the file any other advise how to deal with this?
I'm analyzing comments left the form enquiry . I'm trying to train a model then run asynchronous analysis of a larger dataset.