2개 답변
- 최신
- 최다 투표
- 가장 많은 댓글
1
ok i think you should add a control between
response = textract.analyze_document(
Document={
'S3Object':{
'Bucket': bucket_name,
'Name':document_name
}
},
FeatureTypes= ["TABLES"])
and
doc = Document(response)
in case there is no table extracted from the pdf file
table_blocks = [block for block in response['Blocks'] if block['BlockType'] == 'TABLE']
if not table_blocks:
print("No tables found in the document.")
else:
# process table data here
doc = Document(response)
0
Hi there,
For PDFs, you should use start_document_analysis
. You can update you code to something similar:
response = textract.start_document_analysis(
DocumentLocation={
'S3Object':{
'Bucket': bucket_name,
'Name':document_name
}
},
FeatureTypes= ["TABLES"])
doc = Document(response)
답변함 7달 전
관련 콘텐츠
- AWS 공식업데이트됨 일 년 전
You mention it works with "similar pdf" without an error. Can you validate that the same document works in the AWS Web console? If it works in the console, it should work through API as well, because the console uses the API in the background.