By using AWS re:Post, you agree to the Terms of Use

S3 Hudi Replication and Failover


Two regions, S3 replication configured with replica modification sync on the prefix where Hudi dataset is located. Hudi writes are exclusive to a single region.

  • will S3 replication maintain consistent Hudi dataset in the replicated region ?
  • If we move Hudi writes to the replication region (failover), will hudi dataset stay consistent in the original region, maintained by replica modification sync (failback) from region 2?
1 Answer

Amazon S3 cross region replication doesn't have knowledge about objects dependencies and there is no guarantee of ordering in the replication process (older objects being replicated before newer). As a result you can get the Hudi metadata being replicated before the data and consumers failing to read data objects that are not there yet.

To ensure consistent replication you need to use pipelines written in Spark or Flink reading from the source region and writing to the target region. In that case the transactions log may differ on the target if the pipeline operates at a different frequency than the writer on the source.

answered 7 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions