Best Way to Keep Two S3 Buckets in Sync

0

Here is a setup.

  • BUCKET_1 - Source Endpoint on Prem Replication Task with table preparation mode of "DROP_AND_CREATE".
  • BUCKET_2 - Synced by Lambda from events in BUCKET_1 and is the source endpoint for a migration task to an Aurora RDS instance

The BUCKET_1 has Lambda triggers defined for the events below (in order to copy and delete objects in BUCKET_2): s3:ObjectCreated:* s3:ObjectRemoved:*

The goal is to keep BUCKET_2 in perfect sync with BUCKET_1.

Recently, we have found that the ObjectRemoved* and ObjectCreated* events are not always in chronological order. I found documentation that states the order in which S3 event triggers for lambda are received are not guaranteed to be in order. This leaves a situation where files in BUCKET_2 can be deleted right after creation (the create and delete are out of order).

I have been researching work arounds. One would be to lookup the last update time of the object, when the event is ObjectRemoved*, and if it is within 2 minutes (or some reasonable time frame) then don't delete.

The other option would be to create a CloudWatch Rule like below and bind that to Lamba that would check if the task's eventid = 'DMS-EVENT-0069' and then clean up all associated "dbo" files in

BUCKET_2:
{
  "source": [
    "aws.dms"
  ],
  "detail-type": [
    "DMS Replication State Change"
  ]
}

My concern with the above is whether there will be enough lag time between DMS-EVENT-0069 and the start of data transfer to allow emptying BUCKET_2 of all contents.

We will have up to 450 tasks and 300 buckets supporting the replication of 150 databases, so I am looking for a best practice solution to ensure that BUCKET_1 and BUCKET_2 are in perfect sync. This is critical for replication.

Perhaps there are better options to ensure two buckets are in sync?

UPDATE: Not wanting to persist sequencers due to the lack of persistence storage in our solution the Lean is toward the following solution (this will only work if the ObjectCreated* event is fired after the object has been created and the ObjectRemoved* event is fired after the object has been deleted). There will be no other processes touching these obejects, just DMS and the Lambda.

IN BUCKET_1 ObjectRemoved* EVENT raised during full load DROP_AND_CREATE Lambda Handler 
IF BUCKET_2 has an Object with the same name
    GET bucket_2_object_creation_date
IF time_span_in_minutes ( now - bucket_2_object_creation_date )  > 2
    DELETE Object    
ELSE
      --Object was created by the same Data Migration Task instance, leave it there.
  • Just curious if you have already looked at S3 Same-Region replication?

  • @Gokul - Thanks for the link. I am sure event order is preserved in the article you posted. This may be an ideal solution; however, we have two small data-transformation in the lambda that will have to be re-factored if we use this solution :(

  • @Gokul - It appears that versioning has to be enabled on source bucket of S3- Same-Region replication. Versioning is not compatible when a bucket is the target of a dms task. The DMS documents for S3 endpoints state "Don’t enable versioning for S3".

2 Answers
1
Accepted Answer

For the create event Lambda, add code to add a tag to the object once processed/replicated. For delete event, send the event to SQS first. Subscribe the delete lambda to the queue and only process the delete if the create lambda has added the tag. If it has the create tag, process the delete and delete the message off the queue. You can then adjust the visibility timeout on the queue to give the create time finish.

profile pictureAWS
EXPERT
kentrad
answered a year ago
  • This gave me a great idea and will be used to solution the problem. Thanks for taking the time to respond!

0

Have you looked at Same Region and Cross Region Replication to accomplish keeping the buckets in sync?

https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication.html

profile pictureAWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions