2 Risposte
- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
1
ok i think you should add a control between
response = textract.analyze_document(
Document={
'S3Object':{
'Bucket': bucket_name,
'Name':document_name
}
},
FeatureTypes= ["TABLES"])
and
doc = Document(response)
in case there is no table extracted from the pdf file
table_blocks = [block for block in response['Blocks'] if block['BlockType'] == 'TABLE']
if not table_blocks:
print("No tables found in the document.")
else:
# process table data here
doc = Document(response)
0
Hi there,
For PDFs, you should use start_document_analysis
. You can update you code to something similar:
response = textract.start_document_analysis(
DocumentLocation={
'S3Object':{
'Bucket': bucket_name,
'Name':document_name
}
},
FeatureTypes= ["TABLES"])
doc = Document(response)
con risposta 7 mesi fa
Contenuto pertinente
- AWS UFFICIALEAggiornata 3 anni fa
- AWS UFFICIALEAggiornata 3 anni fa
- AWS UFFICIALEAggiornata un anno fa
You mention it works with "similar pdf" without an error. Can you validate that the same document works in the AWS Web console? If it works in the console, it should work through API as well, because the console uses the API in the background.