How to parse and extract key value pairs from the Textract response in PHP in JSON format

0

Hi,

Please help how to parse and extract key value pairs from the Textract response in PHP using the AnalyzeDocument in PHP.

Below is the code where I'm able to receive a response after calling the Textract API:


<?php 

$client = new TextractClient([
    'region' => 'ap-south-1',
	'version' => '2018-06-27',
	'credentials' => [
        'key'    => 'XXXXXXXXXXXXXXXXXX',
        'secret' => 'XXXXXXXXXXXXXXXXX+XXXXXXXXXXXXXXXXXXXXXx'
	]
]);

// The file in this project.
$filename = "../receipt3.jpg";
$file = fopen($filename, "rb");
$contents = fread($file, filesize($filename));
fclose($file);
$options = [
    'Document' => [
		'Bytes' => $contents
    ],
    'FeatureTypes' => ['FORMS', 'TABLES'], // REQUIRED
];

$result = $client->analyzeDocument($options);
echo print_r($result, true);

?>

==================================================================

Also, is it possible to extract only selected fields using the AnalyzeDocument, just like the "Normalized Field: VENDOR_NAME" etc .. as available n the AnalyzeExpense??

Please share with us the sample codes for PHP, since we have searched all over the Internet and could not locate any resources available for the above request.

Regards,

Zahid

Zakhid
asked 2 years ago2364 views
4 Answers
1

Thanks for reaching out regarding tables. Amazon Textract can extract tables and cells inside the tables. Please refer to the documentation here to know more about the table block type and cell block types and their relationships.

Following is sample code for extract table block type:

$blocksMap = array();
   $tableBlocks = array();
   foreach ($result["Blocks"] as $block) {
      $blocksMap[$block["Id"]] = $block;
      if ($block["BlockType"]=="TABLE") {
         array_push($tableBlocks, $block);
      }
   }

Using the blocksMap and tableBlocks cell blocks can be extracted:

foreach ($tableBlocks as $table) {
      foreach($table["Relationships"] as $relationship) {
         if ($relationship["Type"]=="CHILD") {
            $rowsMap=array();
            foreach($relationship["Ids"] as $childId) {
               $cell = $blocksMap[$childId];
               if (!is_null($cell) && isset($cell["BlockType"]) && $cell["BlockType"]=="CELL") {
                  $rowIndex = $cell["RowIndex"];
                  $colIndex = $cell["ColumnIndex"];
                  if (!isset($rowsMap[$rowIndex])) {
                     $rowsMap[$rowIndex] = array();
                  }
                  $rowsMap[$rowIndex][$colIndex] = $cell;
               }
            }
            print_r($rowsMap);
         }
      }
   }

Similar idea can be applied to get to words and extract the text of the cell. Please refer to sample code in python (see here) regarding the same.

AWS
answered 2 years ago
  • I really appreciate and am grateful for the reply and code for the solution as expected, but the above code only give me the CELL details and not the values of each cell for rows containing the Description, Qty and Amount (please refer to the pic below) http://drnko.rs-technology.co.in/aws/receipt16.jpg

    I sincerely request you to please post the code on how to extract the values from the 2 tables mentioned in the above receipt16.jpg

    Thanks for the link on sample code for the same, but i don't code in Python! Request you to please share the code in PHP for the above table with respective line items containing Description, Qty and Rate.

    I appreciate your valuable help in advance!!

0

Thank you for using Amazon Textract.

The response can be extracted using code like:

print($result["DocumentMetadata"]["Pages"]);

foreach ($result["Blocks"] as $value) {
      if ($value["BlockType"]=='KEY_VALUE_SET') {
         print_r($value);
      }
}

The response structure for AnalyzeExpense (refer here) is different from the response structure of AnalyzeDocument (refer here).

In order to extract the key value pairs in AnalyzeDocument response please refer to logic presented in the sample python code (see here). You can take reference from there in order to code the extraction for key value pairs.

AWS
answered 2 years ago
0

Hi,

As per your help and direction i followed the python code example(as per the linked share) But I'm not getting the expected result as per the below file uploaded: http://drnko.rs-technology.co.in/aws/receipt16.jpg

What am i missing??

Below is the code in PHP:

$result = $client->analyzeDocument($options);

$blocksMap = array();
   $tableBlocks = array();
   
   foreach ($result["Blocks"] as $block) {
      $blocksMap[$block["Id"]] = $block;
      if ($block["BlockType"]=="TABLE") {
         array_push($tableBlocks, $block);
      }
   }
   
foreach ($tableBlocks as $table) {
      foreach($table["Relationships"] as $relationship) {
         if ($relationship["Type"]=="CHILD") {
            $rowsMap=array();
            foreach($relationship["Ids"] as $childId) {
               $cell = $blocksMap[$childId];
               if (!is_null($cell) && isset($cell["BlockType"]) && $cell["BlockType"]=="CELL") {
                  $rowIndex = $cell["RowIndex"];
                  $colIndex = $cell["ColumnIndex"];
                  if (!isset($rowsMap[$rowIndex])) {
                     $rowsMap[$rowIndex] = array();
                  }
                  $rowsMap[$rowIndex][$colIndex] = getText($cell, $blocks_map);
                  
               }
            }
            print_r(json_encode($rowsMap));
         }
      }
   }
   
   function getText($cells, $blocks_map){
       $text = '';
        if (isset($cells["Relationships"])) {
            foreach($cells["Relationships"] as $relationship){
                
                if($relationship["Type"] == "CHILD") {
                    foreach($relationship["Ids"] as $child_id){    
                        $word = $blocks_map[$child_id];
                    if($word["BlockType"] == "WORD"){
                        $text += $word["Text"] + ' ';
                        print_r($text . '\n');
                    }
                    if($word["BlockType"] == "SELECTION_ELEMENT"){
                        if($word["SelectionStatus"] =="SELECTED"){
                            $text +=  'X ';
                        }
                    }
                    }
                }
            }
    return $text;
      
   }

}

Result output:

{"1":{"1":null,"2":null},"2":{"1":null,"2":null},"3":{"1":null,"2":null}}{"1":{"1":null,"2":null},"2":{"1":null,"2":null},"3":{"1":null,"2":null}}{"1":{"1":null,"2":null,"3":null},"2":{"1":null,"2":null,"3":null},"3":{"1":null,"2":null,"3":null}}{"1":{"1":null,"2":null,"3":null},"2":{"1":null,"2":null,"3":null},"3":{"1":null,"2":null,"3":null}}
Zakhid
answered 2 years ago
  • Please use below php code for key and value pair

0
<?php
#Here is the code for textract key and value pair using php
function ParserFile($filename)
{
    $Val = get_kv_map($filename);
    $key_map = $Val["key_map"];
    $value_map = $Val["value_map"];
    $block_map = $Val["block_map"];
    $kvs = get_kv_relationship($key_map, $value_map, $block_map);
    print_r($kvs);
    die();
}

function get_kv_relationship($key_map, $value_map, $block_map)
{
    foreach ($key_map as $block_id => $key_block) {
        $value_block = find_value_block($key_block, $value_map);
        $key = get_text($key_block, $block_map);
        $val = get_text($value_block, $block_map);
        $kvs[$key] = $val;
    }
    return $kvs;
}

function find_value_block($key_block, $value_map)
{
    foreach ($key_block["Relationships"] as $relationship) {
        if ($relationship["Type"] == "VALUE") {
            foreach ($relationship["Ids"] as $value_id) {
                $value_block = $value_map[$value_id];
                return $value_block;
            }
        }
    }
}
function get_text($result, $blocks_map)
{
    $text = "";
    if (array_key_exists("Relationships", $result)) {
        foreach ($result["Relationships"] as $relationship) {
            if ($relationship["Type"] == "CHILD") {
                foreach ($relationship["Ids"] as $child_id) {
                    $word = $blocks_map[$child_id];
                    if ($word["BlockType"] == "WORD") {
                        $text = $word["Text"];
                    }
                    if ($word["BlockType"] == "SELECTION_ELEMENT") {
                        if ($word["SelectionStatus"] == "SELECTED") {
                            $text = "X ";
                        }
                    }
                }
            }
        }
    }

    return $text;
}
function get_kv_map($filename)
{
    $client = new TextractClient([
        "region" => $ClientRegion,
        "version" => $ClientVersion,
        "credentials" => [
            "key" => $Accesskey,
            "secret" => $EncryptedKey,
        ],
    ]);
    $file = fopen($filename, "rb");
    $contents = fread($file, filesize($filename));
    fclose($file);
    fclose($file);
    $options = [
        "Document" => [
            "Bytes" => $contents,
        ],
        "FeatureTypes" => ["FORMS", "TABLES"], // REQUIRED
    ];
    $result = $client->analyzeDocument($options);

    $blocks = $result["Blocks"];

    $key_map = [];
    $value_map = [];
    $block_map = [];
    foreach ($blocks as $block) {
        $block_id = $block["Id"];
        $block_map[$block_id] = $block;
        if ($block["BlockType"] == "KEY_VALUE_SET") {
            if (in_array("KEY", $block["EntityTypes"])) {
                $key_map[$block_id] = $block;
            } else {
                $value_map[$block_id] = $block;
            }
        }
    }

    return [
        "key_map" => $key_map,
        "value_map" => $value_map,
        "block_map" => $block_map,
    ];
}

$filename = "imagedir.png";
ParserFile($filename);

sultan
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions