Error: The provided entity lists contained only generic or high-frequency words. Please provide additional unique entity names for training

0

I was trying to train Comprehend to identify my custom types. When I submit a training job to identify my custom entities it comes up with the error below.

"The provided entity lists contained only generic or high-frequency words. Please provide additional unique entity names for training."

What does the error really mean?

I had created 5 entity type each having 200 different text and then nearly 40 documents of training data. Each of the training data were store as separate files in the s3 bucket. I provided spearate S3 bucket names for both the entity.csv and for the 40 documents stored as txt files

Any help will be appreciated

asked 2 years ago466 views
1 Answer
0

Comprehend will filter out stop words, such as "the", from the provided list of entity names. If Comprehend finds no samples for a given entity type after this filtering, then it will result in the error message mentioned. Please make sure you aren't using stop words for entity names.

AWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions