InvalidS3ObjectException using boto3 and textract


import boto3 from import get_image_uri

region = boto3.Session().region_name

role = 'arn:aws:iam::xxxx'

bucket = 'xxxxx'
bucket_path = "https://s3-{}{}".format('us-west-1', 'xxxxx')

client = boto3.client(
    region_name = 'us-west-1',

res = client.list_objects(Bucket=bucket)

file = res['Contents'][0]['Key']

filename = '_'.join(file.split(' '))

#Amazon Textract client
textract = boto3.client('textract')

# Call Amazon Textract
response = textract.detect_document_text(
        'S3Object': {
            'Bucket': bucket,
            'Name': filename


InvalidS3ObjectException                  Traceback (most recent call last)
/var/folders/d0/gnksqzwn2fn46fjgrkp6045c0000gn/T/ipykernel_87475/ in <module>
     27 # Call Amazon Textract
---> 28 response = textract.detect_document_text(
     29     Document={
     30         'S3Object': {

/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/botocore/ in _api_call(self, *args, **kwargs)
    389                     "%s() only accepts keyword arguments." % py_operation_name)
    390             # The "self" in this scope is referring to the BaseClient.
--> 391             return self._make_api_call(operation_name, kwargs)
    393         _api_call.__name__ = str(py_operation_name)

/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/botocore/ in _make_api_call(self, operation_name, api_params)
    717             error_code = parsed_response.get("Error", {}).get("Code")
    718             error_class = self.exceptions.from_code(error_code)
--> 719             raise error_class(parsed_response, operation_name)
    720         else:
    721             return parsed_response

InvalidS3ObjectException: An error occurred (InvalidS3ObjectException) when calling the DetectDocumentText operation: Unable to get object metadata from S3. Check object key, region and/or access permissions.
gefragt vor 2 Jahren559 Aufrufe
1 Antwort

I think the problem is that you're rewriting the filename (object name) to have _s in place of spaces. When you do this, the name you're passing to detect_document_text() is no longer the name of the object in the bucket, which is why it can't find it.

If you remove this line, it should work:

filename = '_'.join(file.split(' '))
profile pictureAWS
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen