Textract-API not returning TABLE specific data when using PHP client

0

I'm using the following code:

private function analyze(string $fileName): Result
{
    return $this->textractClient->analyzeDocument(
        [
            'Document' => [
                'S3Object' => [
                    'Bucket' => $this->bucketName,
                    'Name' => $fileName,
                ],
            ],
            'FeatureTypes' => ['TABLES', 'FORMS']
        ]
    );
}

I point to a JPEG image in the bucket and everything seems to work. However, the image contains a table with information which is processed correctly when I use the Textract web interface (by uploading the original PDF where the mentioned JPEG image was extracted from) but in the PHP result, there are no block types "TABLE" or "CELL"; they're all of the type "LINE".

Am I doing something wrong?

Any help would be highly appreciated.

Michiel
질문됨 2년 전385회 조회
1개 답변
0

Hi, just want to confirm is your PDF file a multi-page document or just one-page document? If it is multi-page document, then it might be that the result is paginated so you cannot search for TABLE result. Thanks.

AWS
답변함 2년 전
  • The PDF is indeed a multi-page document but the JPEG I'm using with the AnalyzeDocument call is just one page as it's a single image. And this page has a table on it, which gets extracted fine when I use the web interface with the original PDF but gets extracted as LINE blocks when I analyze the JPEG.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠