Training Textract to Separate Text Blocks Into Separate Components with a Delimiter

0

I'm trying to use Textract to extract the product descriptions form our PDF catalogs in page order. The Textract analysis picks up the descriptions as text blocks, but how do I go about training Textract to split each product description text block into its key components, such as title, author, description, etc?

Enter image description here

질문됨 한 달 전101회 조회
1개 답변
1

Hello,

To extract the key components like title, author, and description from product descriptions in your PDF catalogs, Textract currently does not have built-in capabilities for that level of customization.

Machine learning models trained on sample catalog pages could help automatically classify the text into different fields. Services like Amazon SageMaker, AWS Glue, etc can help build such models.

profile picture
Julian
답변함 한 달 전
profile picture
전문가
검토됨 한 달 전
  • You can develop a post-processing system that applies rules to classify text blocks based on layout patterns, or for a more sophisticated solution, train a custom machine learning model with Amazon SageMaker to recognize and categorize the text appropriately.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠