Amazon Textract amazon-textract-response-parser library with python throws "ValidationError: {'_schema': ['Invalid input type.']}" on expense analysis textract response

0

Hi,

I tried following the instructions at https://aws.amazon.com/fr/blogs/machine-learning/announcing-expanded-support-for-extracting-data-from-invoices-and-receipts-using-amazon-textract/ to parse the Textrcat response from a call to get_expense_analysis API (aynchronous API call on a pdf file).

I get a response from the API which seems to be valid json and compliant with the Textract Expense API documentation.

But, when I execute the following code that I copied from the article

from trp.trp2_expense import TAnalyzeExpenseDocument, TAnalyzeExpenseDocumentSchema
t_doc = TAnalyzeExpenseDocumentSchema().load(out)
# out is the json output from the analyse expense API call (tried both text and json dictionnary form to be sure)

I get following error

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
/tmp/ipykernel_7538/447086103.py in <cell line: 2>()
      1 from trp.trp2_expense import TAnalyzeExpenseDocument, TAnalyzeExpenseDocumentSchema
----> 2 t_doc = TAnalyzeExpenseDocumentSchema().load(out)

~/anaconda3/envs/python3/lib/python3.8/site-packages/marshmallow/schema.py in load(self, data, many, partial, unknown)
    717             if invalid data are passed.
    718         """
--> 719         return self._do_load(
    720             data, many=many, partial=partial, unknown=unknown, postprocess=True
    721         )

~/anaconda3/envs/python3/lib/python3.8/site-packages/marshmallow/schema.py in _do_load(self, data, many, partial, unknown, postprocess)
    902             exc = ValidationError(errors, data=data, valid_data=result)
    903             self.handle_error(exc, data, many=many, partial=partial)
--> 904             raise exc
    905 
    906         return result

ValidationError: {'_schema': ['Invalid input type.']}

It is supposed to work out of the box as documented in the article.

I am using Python 3.8.12 and successfully installed amazon-textract-response-parser-0.1.30 botocore-1.24.46 marshmallow-3.14.1

What am I missing (I can share the pdf or the output of the API call if needed) ?

Any help welcome.

Christian.

c_ds
asked 2 years ago610 views
1 Answer
1

Can you file a ticket against the repository? https://github.com/aws-samples/amazon-textract-response-parser/issues Ideally add the JSON causing the issue, that will help to debug. Thank you.

AWS
answered 2 years ago
  • Hi Martin. Done. Thanks !

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions