Best Way to Keep Two S3 Buckets in Sync

Question

Here is a setup.

* BUCKET_1 - Source Endpoint on Prem Replication Task with table preparation mode of "DROP_AND_CREATE".
* BUCKET_2 - Synced by Lambda from events in BUCKET_1 and is the source endpoint for a migration task to an Aurora RDS instance

The BUCKET_1 has Lambda triggers defined for the events below (in order to copy and delete objects in BUCKET_2):
s3:ObjectCreated:*
s3:ObjectRemoved:*

The goal is to keep BUCKET_2 in perfect sync with BUCKET_1.

Recently, we have found that the ObjectRemoved* and ObjectCreated* events are not always in chronological order. I found documentation that states the order in which S3 event triggers for lambda are received are not guaranteed to be in order. This leaves a situation where files in BUCKET_2 can be deleted right after creation (the create and delete are out of order).

I have been researching work arounds. One would be to lookup the last update time of the object, when the event is ObjectRemoved*, and if it is within 2 minutes (or some reasonable time frame) then don't delete.

The other option would be to create a CloudWatch Rule like below and bind that to Lamba that would check if the task's eventid = 'DMS-EVENT-0069' and then clean up all associated "dbo" files in 
```
BUCKET_2:
{
  "source": [
    "aws.dms"
  ],
  "detail-type": [
    "DMS Replication State Change"
  ]
}
```

My concern with the above is whether there will be enough lag time between DMS-EVENT-0069 and the start of data transfer to allow emptying BUCKET_2 of all contents.

We will have up to 450 tasks and 300 buckets supporting the replication of 150 databases, so I am looking for a best practice solution to ensure that BUCKET_1 and BUCKET_2 are in perfect sync. This is critical for replication.

Perhaps there are better options to ensure two buckets are in sync?

**UPDATE**: Not wanting to persist sequencers due to the lack of persistence storage in our solution the Lean is toward the following solution (this will only work if the ObjectCreated* event is fired after the object has been created and the ObjectRemoved* event is fired after the object has been deleted). There will be no other processes touching these obejects, just DMS and the Lambda.

```
IN BUCKET_1 ObjectRemoved* EVENT raised during full load DROP_AND_CREATE Lambda Handler 
IF BUCKET_2 has an Object with the same name
    GET bucket_2_object_creation_date
IF time_span_in_minutes ( now - bucket_2_object_creation_date )  > 2
    DELETE Object    
ELSE
      --Object was created by the same Data Migration Task instance, leave it there.
```

Accepted Answer

For the create event Lambda, add code to add a tag to the object once processed/replicated. For delete event, send the event to SQS first. Subscribe the delete lambda to the queue and only process the delete if the create lambda has added the tag. If it has the create tag, process the delete and delete the message off the queue. You can then adjust the visibility timeout on the queue to give the create time finish.

Answer

Have you looked at Same Region and Cross Region Replication to accomplish keeping the buckets in sync?

https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication.html

Best Way to Keep Two S3 Buckets in Sync

相关内容