PDF files with multiple pages does not work

0

Hi,

We use Textract for automatic processing of PDFs (invoices). PDF files that consist of only one page can be processed very well. Textract has problems with PDF files that contain multiple pages. The error occurs in asynchronous and synchronous communication.

We get the following message: Request has unsupported document format

We are using PHP 7.4.3 with the AWS PHP SKD: https://github.com/aws/aws-sdk-php:

"aws/aws-sdk-php",
"version": "3.232.2"

Code:

            $options = [
                'document' => [
                    'Bytes' => file_get_contents($uploadedFile->getRealPath())
                ],
            ];
            $result = $client->analyzeExpense($options);

Where is the error? Can you help?

已提问 2 年前289 查看次数
2 回答
0

Hi Patrick,

Textract has two modes for processing documents: synchronous and asynchronous. The difference is pretty well summed up here:

Amazon Textract provides synchronous operations for processing small, single-page, documents and with near real-time responses. For more information, see Processing Documents with Synchronous Operations. Amazon Textract also provides asynchronous operations that you can use to process larger, multipage documents. Asynchronous responses aren't in real time. For more information, see Processing Documents with Asynchronous Operations.

The $client->analyzeExpense($options) call you are making uses one of the synchronous API endpoints, and so it doesn't support multi-page documents.

Instead, you'll need to use the startExpenseAnalysis method which starts an asynchronous job. Instead of returning the result, this method returns a JobId which you can use with the getExpenseAnalysis method to get the results once they are ready.

Thanks,

Marrick

Marrick
已回答 2 年前
0

Hi Marrick, thanks for your support. It works.

已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则