Getting InvalidS3ObjectException: Unable to get object metadata from S3

0

About two weeks ago or so, I have started seeing in the logs the following error message that is produced as a result of "startDocAnalysis" call.
2021-06-09T03:52:54.215Z 5e1c9ff3-7b60-5522-b6bd-c6c9463127df ERROR InvalidS3ObjectException: Unable to get object metadata from S3. Check object key, region and/or access permissions.
at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:688:14)
at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:690:12)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:116:18) {
code: 'InvalidS3ObjectException',
time: 2021-06-09T03:52:54.212Z,
requestId: '598d7ac4-00ac-4c25-b239-6c9deba0e3d1',
statusCode: 400,
retryable: false,
retryDelay: 26.7125437006851
}

The code that we use didn't change, and it issues a retry in a few seconds automatically and the second "startDocAnalysis" call succeeds.
Seeing this in eu-west-2 region.
What is causing this behavior? I assume there was a change in the underlying AWS infrastructure either for S3 or for Textract.

Regards,
Albert

AlbertK
已提问 3 年前4705 查看次数
7 回答
0

Hi Albert,

Thank you for choosing Textract. I took a look at our log and it seems like for this request and other 2 requests, we couldn't get object metadata from S3. Could you check object key, region and/or access permissions are setup correctly? If nothing is wrong and it happens again, feel free to cut us a ticket and we will do a further investigation.

Thanks
Alan

AWS
Alan_L
已回答 3 年前
0

Hi Alan,

The files for processing were there. They were uploaded into an S3 bucket as part of our pipeline before the Textract processing was started. And as I have mentioned previously, we have auto retry that kicked off a 2nd attempt to do Textract processing, and the second attempt succeeded.
This is new behavior, I started seeing this around May 25 or so. Previously it did have this issue (our processing pipeline didn't change).
Also worth mentioning, this doesn't happen for all files that we process. For example, if we issue processing for, say, 200 files, only one or two PDF files will issue this error. And in some cases the error doesn't happen at all and everything works just fine.

Regards,
Albert

AlbertK
已回答 3 年前
0

Hi Albert,

Took a deeper look. We could see error message 404 status code while trying get the metadata from s3. Could you please let us know how do you prepare the data (e.g. data uploaded to S3, and then call textract), and whether the data is encrypted using CMK or S3 managed key? After knowing it, we would have more detail to discuss with S3 team to understand why the data was not available immediately. Maybe there are strong consistency and weak availability issue.

Thanks
Alan

AWS
Alan_L
已回答 3 年前
0

Hi Alan,

The processing that we have is as follows.

  1. Obtain S3 URLs. They are given to us via s3.getSignedUrl('getObject', params) call. Usually there are two URLs given to us as part of the work item (there could be many work items).
  2. Upload files into working S3 bucket via s3.upload(params).
  3. Once files are uploaded (code physically waits until the files are uploaded), the process publishes SNS message for downstream processing.
  4. Downstream processing checks whether a PDF file has been submitted into Textract, and if not then issue startDocAnalysis call.

This is where we are seeing occasionally the aforementioned error.

Files are not encrypted. We are using js SDK if that has matters.

Regards,
Albert

AlbertK
已回答 3 年前
0

Hi Albert,

Thank you for providing your data preparing information. We will forward this issue to S3 team and make improvement on this.

Thanks
Alan

AWS
Alan_L
已回答 3 年前
0

I'm having the exact same issue, same sort of data preparation

已回答 1 年前
0

I found my problem - looks like the Textract SDK must be configured to be in the same region as the bucket

已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则

相关内容