2 Answers
- Newest
- Most votes
- Most comments
1
ok i think you should add a control between
response = textract.analyze_document(
Document={
'S3Object':{
'Bucket': bucket_name,
'Name':document_name
}
},
FeatureTypes= ["TABLES"])
and
doc = Document(response)
in case there is no table extracted from the pdf file
table_blocks = [block for block in response['Blocks'] if block['BlockType'] == 'TABLE']
if not table_blocks:
print("No tables found in the document.")
else:
# process table data here
doc = Document(response)
0
Hi there,
For PDFs, you should use start_document_analysis
. You can update you code to something similar:
response = textract.start_document_analysis(
DocumentLocation={
'S3Object':{
'Bucket': bucket_name,
'Name':document_name
}
},
FeatureTypes= ["TABLES"])
doc = Document(response)
answered 6 months ago
Relevant content
- asked a year ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 8 months ago
You mention it works with "similar pdf" without an error. Can you validate that the same document works in the AWS Web console? If it works in the console, it should work through API as well, because the console uses the API in the background.