Getting InvalidS3ObjectException: Unable to get object metadata from S3

0

About two weeks ago or so, I have started seeing in the logs the following error message that is produced as a result of "startDocAnalysis" call.
2021-06-09T03:52:54.215Z 5e1c9ff3-7b60-5522-b6bd-c6c9463127df ERROR InvalidS3ObjectException: Unable to get object metadata from S3. Check object key, region and/or access permissions.
at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:688:14)
at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:690:12)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:116:18) {
code: 'InvalidS3ObjectException',
time: 2021-06-09T03:52:54.212Z,
requestId: '598d7ac4-00ac-4c25-b239-6c9deba0e3d1',
statusCode: 400,
retryable: false,
retryDelay: 26.7125437006851
}

The code that we use didn't change, and it issues a retry in a few seconds automatically and the second "startDocAnalysis" call succeeds.
Seeing this in eu-west-2 region.
What is causing this behavior? I assume there was a change in the underlying AWS infrastructure either for S3 or for Textract.

Regards,
Albert

AlbertK
질문됨 3년 전4704회 조회
7개 답변
0

Hi Albert,

Thank you for choosing Textract. I took a look at our log and it seems like for this request and other 2 requests, we couldn't get object metadata from S3. Could you check object key, region and/or access permissions are setup correctly? If nothing is wrong and it happens again, feel free to cut us a ticket and we will do a further investigation.

Thanks
Alan

AWS
Alan_L
답변함 3년 전
0

Hi Alan,

The files for processing were there. They were uploaded into an S3 bucket as part of our pipeline before the Textract processing was started. And as I have mentioned previously, we have auto retry that kicked off a 2nd attempt to do Textract processing, and the second attempt succeeded.
This is new behavior, I started seeing this around May 25 or so. Previously it did have this issue (our processing pipeline didn't change).
Also worth mentioning, this doesn't happen for all files that we process. For example, if we issue processing for, say, 200 files, only one or two PDF files will issue this error. And in some cases the error doesn't happen at all and everything works just fine.

Regards,
Albert

AlbertK
답변함 3년 전
0

Hi Albert,

Took a deeper look. We could see error message 404 status code while trying get the metadata from s3. Could you please let us know how do you prepare the data (e.g. data uploaded to S3, and then call textract), and whether the data is encrypted using CMK or S3 managed key? After knowing it, we would have more detail to discuss with S3 team to understand why the data was not available immediately. Maybe there are strong consistency and weak availability issue.

Thanks
Alan

AWS
Alan_L
답변함 3년 전
0

Hi Alan,

The processing that we have is as follows.

  1. Obtain S3 URLs. They are given to us via s3.getSignedUrl('getObject', params) call. Usually there are two URLs given to us as part of the work item (there could be many work items).
  2. Upload files into working S3 bucket via s3.upload(params).
  3. Once files are uploaded (code physically waits until the files are uploaded), the process publishes SNS message for downstream processing.
  4. Downstream processing checks whether a PDF file has been submitted into Textract, and if not then issue startDocAnalysis call.

This is where we are seeing occasionally the aforementioned error.

Files are not encrypted. We are using js SDK if that has matters.

Regards,
Albert

AlbertK
답변함 3년 전
0

Hi Albert,

Thank you for providing your data preparing information. We will forward this issue to S3 team and make improvement on this.

Thanks
Alan

AWS
Alan_L
답변함 3년 전
0

I'm having the exact same issue, same sort of data preparation

답변함 일 년 전
0

I found my problem - looks like the Textract SDK must be configured to be in the same region as the bucket

답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠