1 Answer
- Newest
- Most votes
- Most comments
0
Hi,
I did a quick test strikethrough doesn't seem to work in PDF. But if you manage to convert it into an Image it works well (at least for my example below)
Thanks, Rama
Relevant content
- asked 2 years ago
- asked 2 years ago

Thank you! I'm not sure I understand. From what I can see, Textract is reading and including the strikethrough text. I want to do the exact opposite. I want option to remove the strikethrough text from extraction. Or, at least identify that there is strikethrough text. Is this possible?
Apologies, I didn't understand your question correctly. In this case, you may want to try Claude Sonnet 3 or the newer 3.5. It was able to detect strikethrough text. I think you discussed this here https://repost.aws/questions/QUyLEIglprTPiGEZN1L6j2eg/extracting-data-from-pdf-that-contains-strikeout-text-using-amazon-textract-in-python