Boto3 Textract start_document_analysis response changes breaking existing implementation

0

Even after specifying boto3 to 1.19.5 in lambda, We are getting latest boto3 version response for start_document_analysis method. Is there a way to get old response structure for start_document_analysis method.

Earlier we used to get only one table per page, with latest fix(https://github.com/boto/boto3/blob/develop/CHANGELOG.rst#1216) and we are getting multiple tables for the same page for older version for boto3.

Please do let us know how to get older response structure.

preguntada hace 2 años272 visualizaciones
1 Respuesta
1

Textract did update the table model to support merged_cells and table_headers. https://aws.amazon.com/about-aws/whats-new/2022/03/amazon-textract-updates-tables-check-detection/

The update adds a new BlockType called "MERGED_CELLS" and Relationships Type "MERGED_CELL" and an EntityType "COLUMN_HEADER". If you don't need those, you can ignore them.

Outside of those additions the response is the same as the "older" one with all CELLs of a TABLE being the CHILD Relationship. See: https://docs.aws.amazon.com/textract/latest/dg/how-it-works-tables.html

I recommend using https://pypi.org/project/amazon-textract-response-parser/ for parsing the response in Python.

AWS
respondido hace 2 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas