I'm using the following code:
private function analyze(string $fileName): Result
{
return $this->textractClient->analyzeDocument(
[
'Document' => [
'S3Object' => [
'Bucket' => $this->bucketName,
'Name' => $fileName,
],
],
'FeatureTypes' => ['TABLES', 'FORMS']
]
);
}
I point to a JPEG image in the bucket and everything seems to work. However, the image contains a table with information which is processed correctly when I use the Textract web interface (by uploading the original PDF where the mentioned JPEG image was extracted from) but in the PHP result, there are no block types "TABLE" or "CELL"; they're all of the type "LINE".
Am I doing something wrong?
Any help would be highly appreciated.
The PDF is indeed a multi-page document but the JPEG I'm using with the AnalyzeDocument call is just one page as it's a single image. And this page has a table on it, which gets extracted fine when I use the web interface with the original PDF but gets extracted as LINE blocks when I analyze the JPEG.