Does Textract have a way to extract a table of specified dimensions?


I am trying to pull the SKUs, name, and price from Target and Walmart receipts, but when I try expense analysis, it only pulls the name and price, ignoring the SKUs, and when I do document analysis, it will often lump the SKU column in with the name column. Sometimes, it will just add a column on the end for no apparent reason. I need the SKUs to be associated with the name and price, so I can't just look for things matching the format of SKUs. Is there a way for me to specify that I want 3 columns from the table when doing document analysis?

Hello, Amazon Textract has a specific feature to analyze Receipts and Invoices (link). If you have not tried that out, I'd recommend having a look.

If you already did, then I'd ask if the Detect Text feature (link) was able to get the content (that corresponds to the Raw text tab when doing the Document Analysis via the AWS Console.

Overall, there are a couple of things that can be done to improve the accuracy of Textract results, like techniques like sharpening, blurring, thresholding, rotation (for misaligned scanned documents) as examples. For more advanced document layouts, a NLP model can be plugged in to help detecting the content (link)

