2 Respuestas
- Más nuevo
- Más votos
- Más comentarios
1
ok i think you should add a control between
response = textract.analyze_document(
Document={
'S3Object':{
'Bucket': bucket_name,
'Name':document_name
}
},
FeatureTypes= ["TABLES"])
and
doc = Document(response)
in case there is no table extracted from the pdf file
table_blocks = [block for block in response['Blocks'] if block['BlockType'] == 'TABLE']
if not table_blocks:
print("No tables found in the document.")
else:
# process table data here
doc = Document(response)
0
Hi there,
For PDFs, you should use start_document_analysis
. You can update you code to something similar:
response = textract.start_document_analysis(
DocumentLocation={
'S3Object':{
'Bucket': bucket_name,
'Name':document_name
}
},
FeatureTypes= ["TABLES"])
doc = Document(response)
respondido hace 7 meses
Contenido relevante
- OFICIAL DE AWSActualizada hace 2 años
- OFICIAL DE AWSActualizada hace 2 años
- OFICIAL DE AWSActualizada hace 9 meses
You mention it works with "similar pdf" without an error. Can you validate that the same document works in the AWS Web console? If it works in the console, it should work through API as well, because the console uses the API in the background.