スキップしてコンテンツを表示

How does Textract process PDFs with searchable and selectable text? Compared to the "scanned" PDFs?

0

I couldn't find information if Textract working differently with these PDFs. I ponder if there is even a need for Textract if PDF already contains text (which is typically the case for machine generated invoiced and other documents). Textract is still working very well with searchable PDFs.

My question if it makes sense to assess any other services for extracting text? We're going to embed it it with LLM, so we do not care much about form and shape, exact locations of text, overlays and so on.

Thank you!

質問済み 3年前651ビュー
1回答
0

Assuming the text is always searchable/selectable, if you only plan on extracting the raw text and using a standard library does the job, then I'd agree with your assessment that Textract might be overkill. Where Textract really shines is when you do care about the format, structure, location of information, and relationship between blocks / sections of the document.

AWS
回答済み 3年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

関連するコンテンツ