2 Answers
- Newest
- Most votes
- Most comments
2
Hi,
b'\x96' is not a valid utf-8 encoded character. Hence the error message as you specified that your file is utf-8 encoded
b'\x96' is dash ('-') in latin1: so, you may want to say to comprehend that you file is latin1 instead of utf-8.
Best,
Didier
0
Thanks for the quick response, awesome! Are there any formatting guidelines for CSV that we can follow like removing these symbols?
answered 2 years ago
Hi Brendan, thanks for accepting my answer! Instead of removing chars, you may want to convert your file(s) for latin1 to utf-8: see https://milosophical.me/blog/2018/latin1-to-utf8.html
thanks for the resource. unfortunately I'm not a programmer :> i tried in EXCEL to save as a CSV UTF8 but didn't work
Relevant content
- asked 2 years ago

the required format for AWS comprehend is CSV UTF-8 I tried to (1) remove all '-' , but still get same error message I tried to save as a UTF-8 file but causes some corruption of the file any other advise how to deal with this?
I'm analyzing comments left the form enquiry . I'm trying to train a model then run asynchronous analysis of a larger dataset.