How to Boost Performance of Amazon Textract Table Extraction

1

Hi I am using Textract on images of tables and noticed quite poor performance on denser tables. Are there any parameters I can tweak to boost this performance in the example below? I'm surprised that the AI cannot tell how gridlines divide cells.

I'm using Python API with the below:

extractor = Textractor(region_name="us-east-1") # Initialize textractor client, modify region if required

document = extractor.analyze_document(
    file_source=image,
    features=[TextractFeatures.TABLES],
    save_image=True #Keep this true to see bounding boxes for testing
)

Image that Textract fails at

已提問 10 個月前檢視次數 212 次
2 個答案
0

Thank you for bringing this issue to our notice. With machine learning models, we cannot guarantee 100% accuracy. We are continuously improving the accuracy of our models in response to our customer feedback. Please share at least 10 documents that are representative of your production traffic to help us better diagnose the issue and also refer to our best practices to optimize how you use Textract.

AWS
已回答 9 個月前
  • I posted 10 examples in this thread

0

Sure, here are 10 examples.

Enter image description here Enter image description here Enter image description here Enter image description here Enter image description here Enter image description here Enter image description here Enter image description here Enter image description here Enter image description here

已回答 9 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南