Getting InvalidS3ObjectException: Unable to get object metadata from S3

0

About two weeks ago or so, I have started seeing in the logs the following error message that is produced as a result of "startDocAnalysis" call.
2021-06-09T03:52:54.215Z 5e1c9ff3-7b60-5522-b6bd-c6c9463127df ERROR InvalidS3ObjectException: Unable to get object metadata from S3. Check object key, region and/or access permissions.
at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:688:14)
at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:690:12)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:116:18) {
code: 'InvalidS3ObjectException',
time: 2021-06-09T03:52:54.212Z,
requestId: '598d7ac4-00ac-4c25-b239-6c9deba0e3d1',
statusCode: 400,
retryable: false,
retryDelay: 26.7125437006851
}

The code that we use didn't change, and it issues a retry in a few seconds automatically and the second "startDocAnalysis" call succeeds.
Seeing this in eu-west-2 region.
What is causing this behavior? I assume there was a change in the underlying AWS infrastructure either for S3 or for Textract.

Regards,
Albert

AlbertK
asked 3 years ago4469 views
7 Answers
0

Hi Albert,

Thank you for choosing Textract. I took a look at our log and it seems like for this request and other 2 requests, we couldn't get object metadata from S3. Could you check object key, region and/or access permissions are setup correctly? If nothing is wrong and it happens again, feel free to cut us a ticket and we will do a further investigation.

Thanks
Alan

AWS
Alan_L
answered 3 years ago
0

Hi Alan,

The files for processing were there. They were uploaded into an S3 bucket as part of our pipeline before the Textract processing was started. And as I have mentioned previously, we have auto retry that kicked off a 2nd attempt to do Textract processing, and the second attempt succeeded.
This is new behavior, I started seeing this around May 25 or so. Previously it did have this issue (our processing pipeline didn't change).
Also worth mentioning, this doesn't happen for all files that we process. For example, if we issue processing for, say, 200 files, only one or two PDF files will issue this error. And in some cases the error doesn't happen at all and everything works just fine.

Regards,
Albert

AlbertK
answered 3 years ago
0

Hi Albert,

Took a deeper look. We could see error message 404 status code while trying get the metadata from s3. Could you please let us know how do you prepare the data (e.g. data uploaded to S3, and then call textract), and whether the data is encrypted using CMK or S3 managed key? After knowing it, we would have more detail to discuss with S3 team to understand why the data was not available immediately. Maybe there are strong consistency and weak availability issue.

Thanks
Alan

AWS
Alan_L
answered 3 years ago
0

Hi Alan,

The processing that we have is as follows.

  1. Obtain S3 URLs. They are given to us via s3.getSignedUrl('getObject', params) call. Usually there are two URLs given to us as part of the work item (there could be many work items).
  2. Upload files into working S3 bucket via s3.upload(params).
  3. Once files are uploaded (code physically waits until the files are uploaded), the process publishes SNS message for downstream processing.
  4. Downstream processing checks whether a PDF file has been submitted into Textract, and if not then issue startDocAnalysis call.

This is where we are seeing occasionally the aforementioned error.

Files are not encrypted. We are using js SDK if that has matters.

Regards,
Albert

AlbertK
answered 3 years ago
0

Hi Albert,

Thank you for providing your data preparing information. We will forward this issue to S3 team and make improvement on this.

Thanks
Alan

AWS
Alan_L
answered 3 years ago
0

I'm having the exact same issue, same sort of data preparation

answered a year ago
0

I found my problem - looks like the Textract SDK must be configured to be in the same region as the bucket

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions