Comprehend Training Set Format

0

We had a vendor help setup our training set for comprehend custom classification and we are now questions the design. We know its a csv with class in the first column and text in the second but they design it that each line of text (from textract) should be single quote wrapped and comma delimited. We are questioning should it be one long line of text wrapped in double quotes for the text column of the csv.

Is this Incorrect

Drama, "['Here I am today.', 'Lots of text on lines.', 'Welcome to my example.']"

Is this correct

Drama, "Here I am today. Lots of text on lines. Welcome to my example."

Troj
asked a year ago270 views
1 Answer
0

Hi,

Both the example formats that you provided are correct in the sense that the second column in both of your examples are enclosed within a double quote. As long as you create a valid two-column CSV file format, i.e. label in the first column and text/documents in the second column, you should be fine. However, the fact that both of your examples contain a space after the first column (i.e., 'Drama,') can make the file invalid though. You should avoid having a space after the comma separator. As for your example:

Drama, "['Here I am today.', 'Lots of text on lines.', 'Welcome to my example.']"

it will be considered as four-column CSV file. Whereas if you remove the space after 'Drama,' as below,

Drama,"['Here I am today.', 'Lots of text on lines.', 'Welcome to my example.']"

it will be considered as a valid two-column CSV file.

AWS
Navin_Y
answered a year ago
  • Thank you on that for the reply.

    on the second column does comprehend then use the ' and , as part of the evaluation. Meaning it thinks the document has ' and , in the text?

    The reason we ask is that there was a belief that example 1 below comprehend would treat that as a list of phrases in the 2nd column. Or it is that comprehend treats all text in the 2nd column as one long list of text.

    Example 1 "['Here I am today.', 'Lots of text on lines.', 'Welcome to my example.']" Example 2 "Here I am today. Lots of text on lines. Welcome to my example."

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions