Boto3 Textract start_document_analysis response changes breaking existing implementation

0

Even after specifying boto3 to 1.19.5 in lambda, We are getting latest boto3 version response for start_document_analysis method. Is there a way to get old response structure for start_document_analysis method.

Earlier we used to get only one table per page, with latest fix(https://github.com/boto/boto3/blob/develop/CHANGELOG.rst#1216) and we are getting multiple tables for the same page for older version for boto3.

Please do let us know how to get older response structure.

질문됨 2년 전273회 조회
1개 답변
1

Textract did update the table model to support merged_cells and table_headers. https://aws.amazon.com/about-aws/whats-new/2022/03/amazon-textract-updates-tables-check-detection/

The update adds a new BlockType called "MERGED_CELLS" and Relationships Type "MERGED_CELL" and an EntityType "COLUMN_HEADER". If you don't need those, you can ignore them.

Outside of those additions the response is the same as the "older" one with all CELLs of a TABLE being the CHILD Relationship. See: https://docs.aws.amazon.com/textract/latest/dg/how-it-works-tables.html

I recommend using https://pypi.org/project/amazon-textract-response-parser/ for parsing the response in Python.

AWS
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠