Response messages for long multipage documents

0

Hi,

I am using textract with multipage PDFs. Some of the PDFs are 4+ pages long. In these cases, I've called textract, received a successful job id, and got the textract results. These results only contain the detected text for the first 3 pages of the PDF.

To get the results for the remaining pages, do I have to wait for another message in the SQS queue? When this second message arrives with successful status, will I then be able to get results for all pages from a single get call or will these results only be for pages 4 onwards and I'll have to merge them with the results for the first 3 pages?

Thanks,
Kevin

질문됨 3년 전676회 조회
3개 답변
0

Thanks for using Textract and reaching out. Just want to check, are you using paginated client to get all results? Textract returns paginated result for maximum 1000 blocks so results in page 4 might be paginated. If that's not the case, have you tried to segment your 4th page out from the PDF and send that page to Textract and see if it returns any result? It is possible that if there are invalid content (e.g. JPEG 2000 image) on the page Textract would fail on extracting content on the whole page.

AWS
답변함 3년 전
0

Thanks for the response.

To get the results, I'm just grabbing the blocks. I haven't used the paginated feature yet. I'll have to take a look into that with the 'next token' etc.

Otherwise, yes. I'll try the fourth page on it's own to see if that's where the issue is occurring.

Thanks!

답변함 3년 전
0

I just tried it and you were exactly right. I had to grab the 'NextToken' to get the rest of the paginated results. Thanks again for your help!

답변함 3년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠