1 回答
- 最新
- 投票最多
- 评论最多
0
For multi-column documents, I'd suggest trying getLineClustersInReadingOrder()
from the JavaScript/TypeScript version of the Amazon Textract Response parser (reading-order feature doc & code here on GitHub, package on NPM). It's only an approximate/rule-based heuristic though, so might struggle a bit with the warping in your example or need tuning the parameters a little.
For images like your second sample that are a bit more visually busy and light on text, it might be worth trying whether Rekognition DetectText can perform any better?