Need suggestions on how to look up existing objects on S3

0

Hi, we have incoming files everyday, and we upload them to S3. The files can be duplicates from earlier days, so we need to check if they already exist before uploading.

Our old way is to save the filename to SDB after uploading a new file, so we can use SDB query to look up existing files.

We recently want to change it to use S3 HEAD Object API to check existence.

We need suggestions:
(1) if there's better way beside SDB and S3 HEAD API? any new S3 API to check existence in bulk?
(3) is S3 HEAD API good enough for our use case (we need look up ~200 filenames every hour during the day)

Thanks in advance!

xpli
asked 5 years ago181 views
2 Answers
0
Accepted Answer

Hi,

I use S3 pretty extensively but I'm not employed by Amazon so take this as it is.

That amount of HEAD requests per hour will be totally fine, it should not be anywhere close to stressing out the service.

There isn't really a better way to check for existence of a random key. However, if you have a lot of keys at once and the keys share a path-like structure, you could do better by executing a list request on the common prefix and checking the contents.

Keep in mind that S3 may not be immediately consistent. You should read the docs to fully understand the impact to your particular use case, but a couple things stick out:

  1. if you do a GET/HEAD prior to uploading, then PUT, then GET/HEAD -- that response will be eventually consistent i.e. not guaranteed to return that the object does exist
  2. LIST requests are eventually consistent

Given that and that your expectations for number of objects to check, I would recommend keeping it simple and just doing HEAD - maybe with time-based retries to clear up the eventual consistency issue. If you make the wrong decision, you simply re-upload a duplicate, which is not data loss but just extra cost to you, so if this happens every once in a while, shouldn't be too big of a deal.

Hope this helps!

answered 5 years ago
0

Thank you. I agreed to keep it simple is the right way to start. Your suggestions are very much appreciated.

xpli
answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions