- Newest
- Most votes
- Most comments
Hello,
I understand that you would like to know if we can group the entities together based on the information (like phone number and address that belong to a correct engineer entity)
Please note that as of now, unfortunately, AWS Comprehend does not support grouping of entities or tagging the entities with related ones. Having said that as of now, we do not have any built-in feature to achieve this.
However, to look upon whether this feature will be included in future updates to Comprehend service, you can refer to below feature requests.
However, a workaround can be used for the scenario.
As you might already know, when we have custom entity recognizers, we can use it for real-time entity recognition or async analysis jobs. Both these detect entities and give output in below format
{
"BeginOffset": 0,
"EndOffset": 22,
"Score": 0.9763959646224976,
"Text": "John Johnson",
"Type": "JUDGE"
}
You can refer to the below links for more details on output of custom entity recognizers
- for real time: https://docs.aws.amazon.com/comprehend/latest/dg/outputs-cer-sync.html
- for async operations: https://docs.aws.amazon.com/comprehend/latest/dg/outputs-cer-async.html
As you can see in the output, each entity has the parameters "BeginOffset" which represents the beginning of the entity and "EndOffset" which represents the ending of an entity in the source document. (Please refer to the link for more details on the entity component in the output)
We can use these parameters to check which phone number and address entities are placed or appear near an enginner entity. We can use this info to check which entities are occuring near each other in the input document and group them according to these offset details.
For example: if the output has below details:
`Engineer: "Jane", beginOffset: 0, endOffset: 5
PhoneNumber: 12345, beginOffset: 15, endOffset: 20
Engineer: "Helen", beginOffset: 90, endOffset: 95
PhoneNumber: 67890, beginOffset: 100, endOffset: 105`
Here, since engineer Jane appears near the phone number 12345 according to offset details, it will most probably be related to this enginee
However, please note that we will have to post-process the output script to achieve this. We can use any local python script or any other language script to process the output file and group the entities into a required file.
Relevant content
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 2 years ago
Okay, thanks, I thought that would be the case, but just making to be sure I did not miss anything.