EBS Snapshot and Volume creation

0

We are currently working on designing a resilient system that can handle storage failure. I would like to know if this is even possible to achieve from an infrastructure point. The system will be consist of ELB -> ALB -> EC2. The autoscaling group will have 2 pool, one warm and one active pool (both contain a max of 1 instance). In order to achieve high resiliency we are planning on creating a snapshot of active instances EBS volume every 3 minute. In warm autoscaling pool we will have a warm instance ready to go. We will run a lambda every 3 minutes that will pull the latest snapshot and attach to the warm pool instance. In that way if any time the active goes down, the warm instance can come up with latest snapshot data. In theory this looks doable but is there any technical issues that we might run into? Also, how much time it takes for a volume to be built from a 500GB snapshot?

Other info: a) There is only ~5KB/sec of data written per second during business hours only. b) The EBS volume will have around 500GB of data

Thanks

  • Further information is needed to advise on the right architecture/solution: Is the resilience solution designed to be resilient by using just one Availability Zone or resilient thru the region where the application is deployed? Is the data stored in a folder/file system? Are the EC2's behind the ELB's in different AZs? The Operating System of the EC2s is Linux or Windows?

  • Hi @Nuno_Q, please find the answers below: Is the resilience solution designed to be resilient by using just one Availability Zone or resilient thru the region where the application is deployed? --Multi AZ only Is the data stored in a folder/file system? --Data will be stored in multiple folders Are the EC2's behind the ELB's in different AZs? -- Yes The Operating System of the EC2s is Linux or Windows? -- Linux (RHEL8)

1 Answer
0

Hello,

This is not a good system design for resiliency, and is going to get expensive fast. AutoScaling is designed for stateless instances, meaning there ideally shouldn't be any important data stored directly on the EBS volume of the ASG instance. See bottom half for alternatives, and pick whatever might work for your specific application/architecture.


Issues I see with it:

  1. Snapshot costs + time
  2. A new warm pool instance would have to be launched each time you make a new snapshot/AMI, incurring EC2 running costs while it bootstraps
  3. There's only 1 running instance, so no mater how fast the warm pool instance is to come up, there's still going to be downtime while the failover happens (this might be acceptable for your use case, but something to be aware of)
  4. There would still be data loss from the last snapshot

Alternative options:

  1. Create a data tier (RDS, DDB, or some other database/data storage layer) so store data in that the instances can retrieve so that each new instance would be identical. This way you don't need to worry about warm pools at all unless the application itself has a very long bootup time. This would also allow you to use the dynamic scaling part of AutoScaling to add/remove instance depending on load
  2. Add an EFS volume to the instance to store the data that might change, and have a userdata script in the ASGs Launch Template which mounts the EFS volume to any new instance (this could also remove the warm pool if its not needed).
  3. Have a secondary data volume(s) on the instance which is set to not be deleted when the instance is terminated. Attach this volume to newly launched/started instances
  4. Run 2 smaller instances at once (if the application supports it), to be safer in case of an AZ outage or issue with one of the instances (this could be done in addition to either of the above)
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions