Textract multiple answers missing geometry

0

Hello, I would like to know if anything changed in the way Textract gives back answers. Meaning: If I ask : What is the title of this doc? and set it up to look on page1, I get an answer with text and coordinates. However, if I get 'interpreted' answers e.g. What are the standards of this doc, same lookup on page1: I have geometry set given back on None

query is TBlock(geometry=None, id='d1a1bac6-8c00-4b8b-91ef-72ff7d3398d9', block_type='QUERY', relationships=[TRelationship(type='ANSWER', ids=['d3c0611d-a7ba-48ed-9d4a-031e64a3d4f3'])], confidence=None, text=None, column_index=None, column_span=None, entity_types=None, page=1, row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=TQuery(text='what are the standards of the certified weight?', alias='tc_certified_shipping_standards'))

rels is TRelationship(type='ANSWER', ids=['d3c0611d-a7ba-48ed-9d4a-031e64a3d4f3']) [TBlock(geometry=None, id='d3c0611d-a7ba-48ed-9d4a-031e64a3d4f3', block_type='QUERY_RESULT', relationships=None, confidence=43.0, text='GRS, GRS', column_index=None, column_span=None, entity_types=None, page=1, row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=None)]

I have a quite big chunk of code depending on coordinates and for 5 months straight, I had no issue. I did check for having same other libraries related to Textract to the old version and tested on old git branches.

So, is this a new way Textract answers to questions?

Please and thank you!

  • Were you seeing a bounding box on interpreted answers previously with the same document?

  • To be frankly honest, I inherited a tiny piece of code , grew from there, and didnt have to look into it as it was going smooth. So I assume there was geometry before as it didnt crash at the same step within the app.

  • I use the polygon coordinates and I will paste what I get from Textract: Without polygon and geometry, where it now fails: TBlock(geometry=None, id='6e5deb40-4c90-47e7-b99d-933ac8c73231', block_type='QUERY_RESULT', relationships=None, confidence=43.0, text='GRS, GRS', column_index=None, column_span=None, entity_types=None, page=1, row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=None), TBlock(geometry=None, id='d84596f0-3e59-4279-b907-f8f39a3b49dd', block_type='QUERY', relationships=[TRelationship(type='ANSWER', ids=['6e5deb40-4c90-47e7-b99d-933ac8c73231'])], confidence=None, text=None, column_index=None, column_span=None, entity_types=None, page=1, row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=TQuery(text='what are the standards of the certified weight?', alias='tc_certified_shipping_standards')),

    My result has no TPoints with coordinates. Maybe this helps

  • Answer from Textract with coordinates: TBlock(geometry=TGeometry(bounding_box=TBoundingBox(width=0.061864323914051056, height=0.010403391905128956, left=0.5233812928199768, top=0.3567923903465271), polygon=[TPoint(x=0.5233926177024841, y=0.3567923903465271), TPoint(x=0.5852456092834473, y=0.35685184597969055), TPoint(x=0.585235059261322, y=0.3671957850456238), TPoint(x=0.5233812928199768, y=0.367136150598526)]), id='d7fe92f2-c1d0-4298-857a-77cfd5d95c8e', block_type='QUERY_RESULT', relationships=None, confidence=94.0, text='803.28 kg', column_index=None, column_span=None, entity_types=None, page=1, row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=None), TBlock(geometry=None, id='2aaa4f9f-6f0e-4ba0-8973-6a4462ca9bce', block_type='QUERY', relationships=[TRelationship(type='ANSWER', ids=['d7fe92f2-c1d0-4298-857a-77cfd5d95c8e'])], confidence=None, text=None, column_index=None, column_span=None, entity_types=None, page=1, row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=TQuery(text='what is the net shipping weight?', alias='tc_net_shipping_weight')),

anyaovi
질문됨 일 년 전333회 조회
1개 답변
0

On May 15, 2023, Amazon Textract's Query feature in the AnalyzeDocument API received an update that improved the quality of its machine-learning models 1. This reduced latency when using the AnalyzeDocument API with the Queries feature. Furthermore, the update improved the data extraction accuracy for 14 new document types. To take advantage of these improvements, please ensure that you have updated your AWS CLI/SDK to the latest version.

If the issue persists, I suggest opening a case with AWS Premium Support. Their team has access to internal tools that can help identify and resolve the root cause of the issue.

AWS
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠