PDF files with multiple pages does not work



We use Textract for automatic processing of PDFs (invoices). PDF files that consist of only one page can be processed very well. Textract has problems with PDF files that contain multiple pages. The error occurs in asynchronous and synchronous communication.

We get the following message: Request has unsupported document format

We are using PHP 7.4.3 with the AWS PHP SKD: https://github.com/aws/aws-sdk-php:

"version": "3.232.2"


            $options = [
                'document' => [
                    'Bytes' => file_get_contents($uploadedFile->getRealPath())
            $result = $client->analyzeExpense($options);

Where is the error? Can you help?

asked 8 months ago88 views
2 Answers

Hi Patrick,

Textract has two modes for processing documents: synchronous and asynchronous. The difference is pretty well summed up here:

Amazon Textract provides synchronous operations for processing small, single-page, documents and with near real-time responses. For more information, see Processing Documents with Synchronous Operations. Amazon Textract also provides asynchronous operations that you can use to process larger, multipage documents. Asynchronous responses aren't in real time. For more information, see Processing Documents with Asynchronous Operations.

The $client->analyzeExpense($options) call you are making uses one of the synchronous API endpoints, and so it doesn't support multi-page documents.

Instead, you'll need to use the startExpenseAnalysis method which starts an asynchronous job. Instead of returning the result, this method returns a JobId which you can use with the getExpenseAnalysis method to get the results once they are ready.



answered 7 months ago

Hi Marrick, thanks for your support. It works.

answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions