"Invalid S3 Object" Error with Immediate Textract Processing After S3 Upload

0

Hi all,

I'm facing an issue where I upload a file to an S3 bucket and then immediately invoke Amazon Textract's StartDocumentTextDetection. Sometimes, I get an "invalid S3 object" error, which disappears on delayed retries.

Does this suggest that S3's strict consistency doesn't apply to immediate reads by Textract?

Any insights would be appreciated. Thanks!

/aav

profile picture
asked 4 months ago166 views
2 Answers
0

Yes, it seems like you are running into consistency issues with S3 when invoking Textract immediately after uploading an object.

S3 has different consistency models for reads:

  • Strong consistency - returns latest version of object every time.

  • Eventual consistency - may return older version of object temporarily until latest writes are propagated.

By default, S3 offers read-after-write consistency for PUTs of new objects. So a new object PUT should be immediately readable.

However, there can still be lag in propagating writes across S3 servers. So Textract may retrieve an older version or invalid state of the object if invoked very quickly after the PUT.

A few ways to deal with this:

  • Add a short delay (1-2 sec) before invoking Textract after upload.

  • Use S3 object versioning and pass latest version to Textract.

  • Use S3 replication to replicate to another region first, then read from the replica.

  • Retry the Textract call with exponential backoff if you get "invalid" errors.

So in summary, add a bit of waiting/retry to account for S3 consistency lag when processing right after uploads.

AWS
Saad
answered 4 months ago
0

Hello Saad,

Thank you for the quck response.

Although I clearly understand the practical part of your answer, and in any case we have a way to retry failed textract requests, I don't get the conceptual part.

How do this two consistency models co-exist? Does it mean, that textract uses the eventually consistent read? While usual S3 GetObject is strongly consistent (I don't see a parameter for GetObject, that may impact read consistency).

Also, as I read on the AWS site: "After a successful write of a new object, or an overwrite or delete of an existing object, any subsequent read request immediately receives the latest version of the object."

Does it mean that consistency guarantees are provided only when the existing object is updated (readers will never see the old version), but not when the object is created (readers may still have to wait until the newly created object will be propagated).

But this contradics the statement from the site, mentioned above. I'm a bit confused.

/aav

profile picture
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions