Need information regarding Aurora AZ Failure

0

I need information about aurora(PostgreSQL) AZ failure . Just to give some context - I am trying to perform az failure by using blackhole NACL (blocking ingress & egress traffic )and validating aurora az failure as part of the chaos experiment.

aurora DB is running on us-east-1

instance typeaz
primary/writer DBus-east-1c
reader DBus-east-1b

Now if we apply blackhole NACL to the us-east-1c subnet, then ideally reader DB instance should be promoted as the primary DB at us-east-1b and serve the traffic without interruptions, but what I see is no change is happening on the DB instance.

It would be helpful if anyone can clarify DB instance behavior of az failure Aurora PostgreSQL.

asked 2 years ago583 views
1 Answer
1
Accepted Answer

Hi there,I understand you are performing a failover on your Aurora PostureSQL cluster and you are not seeing any changes on your DB instance. Please correct me if I misunderstood.

A network access control list (ACL) is an optional layer of security for your VPC that acts as a firewall for controlling traffic in and out of one or more subnets. In your case, I understand that you wanted to use the blackhole NACL as a way to failover your AZ. However, the blackhole created only blocks the DB instances’ reachability, but doesn’t make it unavailable. 

Failover occurs in the event of (but not limited to) loss of availability in primary availability zone, loss of network connectivity to primary, compute unite failure on primary and storage failure on primary.As a result, you are unable to see any changes on your DB instance.Also note that being unable to connect to the instance does not result on the instance being unavailable.

If the primary instance in a DB cluster fails, Aurora automatically fails over in the following order: i) if Aurora read replicas are available it will then promote an existing read replica to the new primary instance. ii)if no read replicas are available, then you will have to create a new primary instance. If the DB cluster has one or more Aurora Replicas, then an Aurora Replica is promoted to the primary instance during a failure event. A failure event results in a brief interruption, during which read and write operations fail with an exception.

For failover I would suggest using failover with amazon Aurora by following this link[1].

I have attached documents to more information in the reference.

I hope this was helpful.

Reference

[1]https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html#USER_ReadRepl.Promote

[2]https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.MultiAZSingleStandby.html

[3]https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.MultiAZSingleStandby.html

[4] https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-global-database-disaster-recovery.html

Karabo
answered 2 years ago
profile picture
EXPERT
reviewed a month ago
  • Thanks , got clarifications.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions