1 Answer
- Newest
- Most votes
- Most comments
0
Hi,
Both the example formats that you provided are correct in the sense that the second column in both of your examples are enclosed within a double quote. As long as you create a valid two-column CSV file format, i.e. label in the first column and text/documents in the second column, you should be fine. However, the fact that both of your examples contain a space after the first column (i.e., 'Drama,') can make the file invalid though. You should avoid having a space after the comma separator. As for your example:
Drama, "['Here I am today.', 'Lots of text on lines.', 'Welcome to my example.']"
it will be considered as four-column CSV file. Whereas if you remove the space after 'Drama,' as below,
Drama,"['Here I am today.', 'Lots of text on lines.', 'Welcome to my example.']"
it will be considered as a valid two-column CSV file.
answered a year ago
Relevant content
- asked a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
Thank you on that for the reply.
on the second column does comprehend then use the ' and , as part of the evaluation. Meaning it thinks the document has ' and , in the text?
The reason we ask is that there was a belief that example 1 below comprehend would treat that as a list of phrases in the 2nd column. Or it is that comprehend treats all text in the 2nd column as one long list of text.
Example 1 "['Here I am today.', 'Lots of text on lines.', 'Welcome to my example.']" Example 2 "Here I am today. Lots of text on lines. Welcome to my example."