- Le plus récent
- Le plus de votes
- La plupart des commentaires
Hi,
It seems according to your code that you are replacing the getResultsResponse
in every loop by the subset gotten by the NextToken
.
Usually the order of blocks in a full response in WORD
then LINE
then CELL
, so at first iteration you have the WORD, LINE and CELL in the getResultsResponse
which is why you can find the children elements. However when you get the next set of block, you will get only CELL
so when you look at the children, they won't be in the subset.
ie, if the full response is [ PAGE_1, WORD_1, WORD_2, TABLE_1, CELL_1, CELL_2]
, then the getResultsResponse
will look like this if we accept 5 blocks by response before going to the next Token :
- 1st iteration :
getResultsResponse = [ PAGE_1, WORD_1, WORD_2, TABLE_1, CELL_1 ]
so you will findWORD_1
the child ofCELL_1
- 2nd iteration :
getResultsResponse = [ CELL_2 ]
soWORD_2
child ofCELL_2
will not be part of this list, but is in the first iteration, which is why you cannot find it.
A fix for that would be to first query all the Response part, and concatenate in a single list. Then you can run you script to fetch the children within this concatenate list.
Hope this helps.
To add to the above response, please review the below implementation for the get_full_json
function in python. It implements the fix recommended.
https://github.com/aws-samples/amazon-textract-textractor/blob/master/caller/textractcaller/t_call.py
Collecting all the blocks up front in a Dictionary worked! Thank you!
Here is the modified code...
Dictionary<string, Block> blocks = new Dictionary<string, Block>();
if (getResultsResponse.JobStatus == JobStatus.SUCCEEDED)
{
do
{
getResultsResponse.Blocks.ForEach(x => {
blocks.Add(x.Id, x);
});
if (string.IsNullOrEmpty(getResultsResponse.NextToken)) { break; }
getResultsRequest.NextToken = getResultsResponse.NextToken;
getResultsResponse = _textractClient.GetDocumentAnalysis(getResultsRequest);
} while (getResultsResponse.Blocks.Count > 0);
foreach (KeyValuePair<string, Block> entry in blocks)
{
if (entry.Value.BlockType.Equals("CELL"))
{
Console.WriteLine("Page: " + entry.Value.Page.ToString());
Console.WriteLine("Rowindex: " + entry.Value.RowIndex.ToString());
Console.WriteLine("Colindex: " + entry.Value.ColumnIndex.ToString());
entry.Value.Relationships.ForEach(y =>
{
y.Ids.ForEach(z =>
{
if (blocks.ContainsKey(z.ToString()))
{
var cellText = blocks[z.ToString()].Text;
if (!string.IsNullOrEmpty(cellText))
{
Console.Write($"{cellText} ");
}
else
{
Console.Write($"EMPTY ");
}
}
});
});
}
}
Contenus pertinents
- demandé il y a un an
- demandé il y a 2 mois
- demandé il y a un an
- demandé il y a un an
- AWS OFFICIELA mis à jour il y a 2 ans