Questions tagged with Availability
Content language: English
Sort by most recent
API deployed to Elasticbeanstalk suddenly stopped working
Hi, I'm pretty new to AWS. We deployed our node.js app to Elasticbeanstalk, while creating Postgres database on RDS. So all worked fine for few weeks and since yesterday, I suddenly cannot access the service. It does not response. I tried looking at logs on Elasticbeanstalk but it's empty after December 1st. In Chrome Network inspect, it just says: `Failed to load response data. Resource with given identifier does not exist`. We were using free tier. I am sure something happened with the start of new month, but don't know what. I guess our plan expired or something like that. Also, I've tried accessing the environment using ssh (with valid .pem file, it worked before). However, now it says Permission defined What could be an issue here? Thanks in advance
What is the difference between Lightsail availability and EC2 availability?
As of now, I have been using Lightsail-related computing business for several months, but I have found that sometimes my customers cannot connect to the server in recent months, and this time lasts for a short time, and it will be restored in a few minutes. Still, it makes me worry about its usability. After my investigation, when my client can't access the server, he can't ping the target IP address. Does this explain the difference in the availability of EC2 and Lightsail? Or is it an issue with the AWS network? If my business requires high availability of the network, will using more expensive EC2 improve availability?
Client VPN availability question
The Client VPN examples at https://aws.amazon.com/blogs/networking-and-content-delivery/using-aws-client-vpn-to-scale-your-work-from-home-capacity/ use this as an example for a failover (?) setup between two AZs: ![Enter image description here](/media/postImages/original/IMs7uxqNEgTjib9fvgmkEeyQ) Is that enough to ensure connectivity between "remote workers" and VPC B/C/D in case of a problem in AZ A or AZ B? Is there any way I could realistically simulate a failure of one AZ? I have the recommended setup for TGW attachments in their own /28. My AZ A and AZ B in this case are not the subnets with the TGW attachment, because the CVPN endpoint doesn't allow association with a subnet smaller than /27 - would it be a better/worse idea or make any difference at all if I used /27 subnets for the TGW attachments so I could associate the CVPN endpoints with the same subnets? Thanks, Marc
Will my EC2 instance and any service running on it automatically reboot after an availability zone issue has been resolved?
Sometimes there is a malfunction in an availability zone causing all services in that availability zone to be offline. When the AZ malfunction is resolved do all services such as EC2 instances reboot automatically again? Or do i have to manually start them back up?
AWS Batch requesting more VCPU's than tasks require
Hi, We have an AWS Batch compute environment set up to use EC2 spot instances, with no limits on instance type, and with the `SPOT_CAPACITY_OPTIMIZED` allocation strategy. We submitted a task requiring 32 VCPUs and 58000MB memory (which is 2GB below the minimum amount of memory for the smallest 32 VCPU instance size, c3.8xlarge, just to leave a bit of headroom), which is reflected in the job status page. We expected to receive an instance with 32 VCPUs and >64GB memory, but received an `r4.16xlarge` with 64 VCPUs and 488GB memory. An `r4.16xlarge` is rather oversized for the single task in the queue, and our task can't take advantage of the extra cores, as we pin the processes to the specified number of cores so multiple tasks scheduled on the same host don't contend over CPU. We have no other tasks in the queue and no currently-running compute instances, nor any desired/minimum set on the compute environment before this task was submitted. In the autoscaling history, it shows: `a user request update of AutoScalingGroup constraints to min: 0, max: 36, desired: 36 changing the desired capacity from 0 to provide the desired capacity of 36` Where did this 36 come from? Surely this should be 32 to match our task? I'm aware that the docs say: `However, AWS Batch might need to exceed maxvCpus to meet your capacity requirements. In this event, AWS Batch never exceeds maxvCpus by more than a single instance.` But we're concerned that once we start scaling up, each task will be erroneously requested with 4 extra VCPUs. I'm guessing this is what happened in this case is due to the `SPOT_CAPACITY_OPTIMIZED` allocation strategy. * Batch probably queried for the best available host to meet our 32 VCPU requirement and got the answer c4.8xlarge, which has 36 cores. * Batch then told the autoscaling group to scale to 36 cores, expecting to get a c4.8xlarge from the spot instance request. * The spot instance allocation strategy is currently set to `SPOT_CAPACITY_OPTIMIZED`, which prefers instances that are less likely to be killed (rather than preferring the cheapest/best fitting). * The spot instance request looked at the availability of c4.8xlarge and decided that they were too likely to be killed under the `SPOT_CAPACITY_OPTIMIZED` allocation strategy, and decided to sub it in with the most-available host that matched the 36 core requirement set by batch, which turned out to be an oversized 64 VCPU r5 instead of the better-fitting-for-the-task 32 or 48 VCPU R5. But the above implies that Batch itself doesn't follow the same logic as the `SPOT_CAPACITY_OPTIMIZED`, and instead requests the specs of the "best fit" host even if that host will not be provided by the spot request, resulting in potentially significantly oversized hosts. Alternatively, the 64 VCPU r5 happened to have better availability than the 48/32 VCPU r5, but I don't see how that would be possible, since the 64 VCPU r5 is just 2*the 32 VCPU one, and these are virtualised hosts, so you would expect the availability of the 64 VCPU to be half that of the 32 VCPU one. Can it be confirmed if either of my guesses here are correct, or if I'm thinking about it the wrong way, or if we missed a configuration setting? Thanks!
Auto scaling - Capacity Rebalance kept stopping/terminating instances
We are seeing that recently there seems to be issue with ``` Capacity rebalanceInfo When you enable capacity rebalancing, and a rebalance notification is sent to an instance, EC2 Auto Scaling automatically attempts to replace the instance before it is interrupted. ``` When enabled, we see that our instance kept terminating and stopping, which creates unnecessary loads of our cache/DB/other infra as every new instance requires setup/init This is NOT due to spot instance being terminated as the termination reason is always ``` At 2022-10-24T01:59:17Z an instance was taken out of service in response to an EC2 instance rebalance recommendation. ``` What we expect is when an instance is only terminated only when 1. spot capacity termination request 2. spot lifetime exceeded Perhaps this is an issue with recent capacity limit issue w east? ``` Launching a new EC2 instance. Status Reason: Could not launch Spot Instances. InsufficientInstanceCapacity - We currently do not have sufficient capacity in the Availability Zone you requested. Our system will be working on provisioning additional capacity. ```
How long will the AWS RDS outage take when enabling backups?
The documentation (https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithAutomatedBackups.html) states that in case of changing the retention period from 0 to some positive number, there will be an outage. Unfortunately i cannot find any info on how long will it be. This is a production db for high availability service - and I do not want it to take like an hour or so. The db size is significant - about 100 GB. Does anyone know how long can it take?
[🚀Launch Announcement] - AWS Gateway Load Balancer launches Target Failover feature
Hello, ELB team is happy to announce that we just launched a new Target Failover feature that provides an option to define flow handling behavior for AWS Gateway Load Balancer. Using this option, customers can now rebalance existing flows to a healthy target, when the target fails or deregisters. This helps reduce failover time when a target becomes unhealthy, and also allows customers to gracefully patch or upgrade the appliances during maintenance windows. Launch Details: * This feature uses the existing ELB API/Console and provides new attributes to specify the flow handling behavior. You can use the existing modify-target-group-attributes API to define flow handling behavior using the two new attributes target_failover.on_unhealthy and target_failover.on_deregistration. * This feature does not change the default behavior and existing GWLBs are not affected. * The feature is available using API and AWS Console. * The feature is available in all commercial, GovCloud, and China regions. It will be deployed in ADC regions at a later date based on demand. * Customers should evaluate the effect of enabling this feature on availability and check with their third-party appliance provider documentation. * AWS appliance partners should consider taking following actions - (a) Partners should validate whether rebalancing existing flows to healthy target has implications on their appliance as it will start receiving the flow midway, i.e. without getting the TCP SYN. (b) Update public documentation on how this feature will affect their appliance. (c) Partner may use this capability to improve stateful flow handling on their appliances. Launch Materials: * Launch Blog - https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-aws-gateway-load-balancer-target-failover-for-existing-flows/ * Feature Documentation - https://docs.aws.amazon.com/elasticloadbalancing/latest/gateway/target-groups.html#target-failover * Attribute Documentation - https://docs.aws.amazon.com/elasticloadbalancing/latest/gateway/target-groups.html#target-group-attributes Thank you!