Does the custom entity recognition of Amazon Comprehend does not works with sem-structured data in Spanish?


I want to extract custom entities from custom PDF documents in Spanish.

To do so, I am (unsuccessfully) trying to follow this tutorial Extract entities from insurance documents using Amazon Comprehend named entity recognition , to extract custom entities from my documents.

Just a side note, in order to annotate my custom data, I've successfully followed this related tutorial Custom document annotation for extracting named entities in documents using Amazon Comprehend.. No issues with this tutorial, I've the annotations output.

My issue is with the first tutorial. After filling all the required fields to create and train a new model, I get an error message like this.

Given that all my documents are in Spanish, the error message makes sense, but the language restriction is too restrictive to make sense. I see here that said Spanish is a supported language for Custom entity recognition feature of Amazon Comprehend...

What am I doing wrong? What assumptions am I making that are wrong?

1 Answer

Hey Daniel, You are doing nothing wrong, right now annotation in semi-structured documents is only available for english, that's why you get that message when selecting Spanish as language and providing pdf/word annotated documents. I will notify internally for the documentation/console to be updated on this!

Dani M
answered 2 years ago

