Open Search Cross Cluster Replication

1

Understanding Active Passive Cross Cluster replication for AWS Open Search[1] is pretty straightforward. You have a leader index in the active cluster that replicates everything to the follower index in the passive cluster. In case there is a DR and we witness a region failure, we can continue to process read operations through the follower index on the passive cluster.

However, the documentation provided does not go through the following two points.

  1. How do we handle write operations that come during the time the active cluster is down. Is it possible to write on the passive cluster during this time and the primary would catch up when it comes back online or only read is possible during this time, and we need to batch process all the writes once active cluster comes back up ?

  2. When active cluster comes back online, does Open Search automatically starts redirecting search queries back to active cluster instead of the passive cluster ?

  3. In addition, in case we wish to test the DR failover setup for Open Search, is it possible to simulate active cluster failure and test ?

[1] https://docs.aws.amazon.com/opensearch-service/latest/developerguide/replication.html

asked 2 years ago1502 views
1 Answer
0

Hi. That's a great question. Below is the clarification :

  1. Currently, replication under OpenSearch works in active-passive model. The follower will not take any write traffic and is only open for search. Yes, once the leader cluster is back online, the writes have to be made on the leader(/active) cluster.
  2. Each cluster (leader and follower) will have separate endpoint and OpenSearch doesn't redirect the search traffic. Search traffic will continue on the configured search endpoint at the client.
  3. Inducing failure directly on the managed service nodes is not possible. Please follow the steps for the failover onto the follower cluster: Terminate the connection between follower and leader cluster (Connection APIs) Trigger stop on all the follower indices that were getting replication: https://opensearch.org/docs/latest/replication-plugin/get-started/#stop-replication Once replication is stopped on all the indices, the follower indices should be able to take write traffic.

Let us know, If you have any further questions.

AWS
Varun_S
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions