2 réponses
- Le plus récent
- Le plus de votes
- La plupart des commentaires
1
ok i think you should add a control between
response = textract.analyze_document(
Document={
'S3Object':{
'Bucket': bucket_name,
'Name':document_name
}
},
FeatureTypes= ["TABLES"])
and
doc = Document(response)
in case there is no table extracted from the pdf file
table_blocks = [block for block in response['Blocks'] if block['BlockType'] == 'TABLE']
if not table_blocks:
print("No tables found in the document.")
else:
# process table data here
doc = Document(response)
0
Hi there,
For PDFs, you should use start_document_analysis
. You can update you code to something similar:
response = textract.start_document_analysis(
DocumentLocation={
'S3Object':{
'Bucket': bucket_name,
'Name':document_name
}
},
FeatureTypes= ["TABLES"])
doc = Document(response)
répondu il y a 7 mois
Contenus pertinents
- demandé il y a 4 mois
- demandé il y a 2 mois
- demandé il y a 6 mois
You mention it works with "similar pdf" without an error. Can you validate that the same document works in the AWS Web console? If it works in the console, it should work through API as well, because the console uses the API in the background.