- Newest
- Most votes
- Most comments
The "context" mentioned here refers to the document in which the mention occurs (which is not necessarily a specific key-value labelling) - not something you annotate.
To illustrate:
- A simple RegEx e.g.
\d{4}-\d{2}-\d{2}
could extract all date mentions e.g. "2022-08-25" from text - A text entity recognition model could be trained to distinguish different entity types by the context in which their mentioned. For example, 'Agreement_Start_Date' and 'Agreement_End_Date' if you were training a model on contract documents that might contain a sentence like "This agreement shall be effective from 2022-08-25, and will end on 2022-08-26".
- A layout-aware entity recognition model could be trained to distinguish different entity types by the combination of text content and overall page layout. For example, trying to distinguish the sender address vs recipient address in traditional letters.
As mentioned on the other question, if your documents really do contain explicit Label: Value
pairs then you probably don't need to go to the trouble of training a model in Comprehend: Textract Form Extraction should pick them out for you (you can try it out in the AWS Console for Amazon Textract).
If you have more complex documents where the pre-trained key: value detection in Amazon Textract doesn't meet the need, then yes you could train an Amazon Comprehend model by example, to extract different entity types ('Engineer_Name', 'Analyst_Name' for example). I'd suggest to refer to this two-part blog:
- Annotating the documents via Amazon SageMaker Ground Truth
- Training and using the model in Comprehend
If you were training the model to extract 'Engineer_Name' and 'Analyst_Name' entity types, you would highlight "John Doe" for one and "Jane Doe" for the other. The model would (ideally!) learn to pick out other analyst names vs engineer names when seeing similar-looking documents in future. Hope this helps to clarify!
Relevant content
- asked 2 years ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 10 months ago