Boto3 Textract start_document_analysis response changes breaking existing implementation

0

Even after specifying boto3 to 1.19.5 in lambda, We are getting latest boto3 version response for start_document_analysis method. Is there a way to get old response structure for start_document_analysis method.

Earlier we used to get only one table per page, with latest fix(https://github.com/boto/boto3/blob/develop/CHANGELOG.rst#1216) and we are getting multiple tables for the same page for older version for boto3.

Please do let us know how to get older response structure.

gefragt vor 2 Jahren273 Aufrufe
1 Antwort
1

Textract did update the table model to support merged_cells and table_headers. https://aws.amazon.com/about-aws/whats-new/2022/03/amazon-textract-updates-tables-check-detection/

The update adds a new BlockType called "MERGED_CELLS" and Relationships Type "MERGED_CELL" and an EntityType "COLUMN_HEADER". If you don't need those, you can ignore them.

Outside of those additions the response is the same as the "older" one with all CELLs of a TABLE being the CHILD Relationship. See: https://docs.aws.amazon.com/textract/latest/dg/how-it-works-tables.html

I recommend using https://pypi.org/project/amazon-textract-response-parser/ for parsing the response in Python.

AWS
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen