Questions tagged with AWS Auto Scaling

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Auto-scaling is not working with Neptune cluster except when primary writer instance type db.r5d.xlarge.

Issue: Scale-up actions work fine with any instance size, the scale-in action is triggered by the CloudWatch but is not able to remove the readers except r5d.xlarge. I am trying to Auto-scale the Amazon Neptune database to meet workload demands, But I'm getting issues while the Neptune writer is r5d.xlarge then it's working fine but when I changed the writer instance size then it's not working. I did not set neptune_autoscaling_config on the cluster parameter group. Applied the same configuration which is in the reference blog post, but the one thing is different when I created the auto-scaling the first time was that time Writer instance was on r5d.xlarge. after that, I changed the writer instance size to t3.medium and then I deleted the old configuration of the auto-scaling app and scaling policy, deregistered the scale targets, and then created new everything about autoscaling, then after the scale-up action is working fine, but the scale-in action is not working except r5d.xlarge. I am not getting any error from the CloudWatch, CloudWatch action triggered successfully but it does not remove the Neptune reader which was created while the scale-up action, and also I'm not getting scaling activities which policy action is not able to delete the reader. This same thing is working fine on our Prod and Stage accounts but this issue has only occurred on the Dev account. Note: Scale-in(remove neptune reader) is working fine through a scheduled action https://docs.aws.amazon.com/autoscaling/application/userguide/examples-scheduled-actions.html Can anyone please help me with this? Thanks in advance! Below blog which I am using for reference https://aws.amazon.com/blogs/database/auto-scale-your-amazon-neptune-database-to-meet-workload-demands/
0
answers
0
votes
45
views
asked 5 months ago

ECS services not scaling in (scale in protection is disabled)

Hello. I've an ECS cluster (EC2 based) attached to a CSP. The service scaling out is OK, but it isn't scaling IN. And I've already checked the scale in protection and it's disabled (Disable Scale In: false) Description of the environment: - 1 cluster (ec2-based), 2 services - Services are attached to an ALB (registering and deregistering fine) - Services are with autoscaling enabled, checking memory (above 90%), NO scale in protection,1 task minimum, 3 tasks max. - Services are using a Capacity Service provider, apparently working as intended: it's creating new EC2 instances when new tasks are provisioned and dropping when they're with 0 tasks running, registering and deregistering as expected. - The cloudwatch alarms are working fine, Alarming when expected (with Low and High usages) Description of the test and what's "not working": - Started with 1 task for each service and 1 instance for both services. - I've managed to enter one of the containers and run a memory test, increasing its usage to over 90% - The service detected it and asked for the provision of a new task. - There were no instances that could allocate the new task, so the ECS asked for the CSP/Auto Scaling Group a new ec2 instance - The new instance was provisioned, registered in the cluster and ran the new task. - The service's memory usage avg. decreased from ~93% to ~73% (average from the sum of both tasks) - All's fine, the memory stress ran for 20 minutes. - After the memory stress was over, the memory usage dropped to ~62% - The cloudwatch alarm was triggered (maybe even before, when it was with 73% usage, I didn't check it) - The service is still running 2 tasks right now (after 3 hours or more) and it's not decreasing the Desired Count from 2 to 1. Is there anything that I'm missing here? I've already done a couple of tests, trying to change the service auto scaling thresholds and other configurations, but nothing is changing this behaviour. Any help would be appreciated. Thanks in advance.
1
answers
0
votes
109
views
asked 6 months ago

EMR autoscaling: 'org.apache.hadoop.util.DiskChecker$DiskErrorException(No space available in any of the local directories.)'

I get following error when running tez query. This is in EMR cluster with auto scaling enabled. Root device EBS volume size: 100 GiB Additional EBS volume: 200 GiB ``` bash-4.2$ls -lh /tmp lrwxrwxrwx 1 root root 8 Jun 2 13:20 /tmp -> /mnt/tmp ``` /mnt has enough space: ``` /dev/dev1 195G 3.7G 192G 2% /mnt ``` ``` INFO : Cleaning up the staging area file:/tmp/hadoop/mapred/staging/hdfs1254373830/.staging/job_local1254373830_0002 ERROR : Job Submission failed with exception 'org.apache.hadoop.util.DiskChecker$DiskErrorException(No space available in any of the local directories.)' org.apache.hadoop.util.DiskChecker$DiskErrorException: No space available in any of the local directories. at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:416) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:165) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:130) at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:123) at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:172) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:794) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:251) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:423) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:149) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224) at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:330) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. No space available in any of the local directories. ```
1
answers
0
votes
57
views
asked 6 months ago

Auto Scaling Group not scaling based on ECS desired task count

I have an EC2-backed ECS cluster which contains a ASG (using Cluster Auto Scaling) that is allowed to scale between 1 and 5 EC2 instances. There is also a service defined on this cluster which is also set to scale between 1 and 5 tasks with each task reserving almost the full resources of a single instance. I have configured the service to scale it's desired task count depending on the size of various queues within an Amazon MQ instance which is all handled by CloudWatch alarms. The scaling of the desired task count works as expected but the ASG doesn't provision new EC2 instances to fit the amount of desired tasks unless I manually go in and change the desired capacity of the ASG. This means the new tasks never get deployed as ECS cant find any suitable instances to deploy them too. I dont know if i'm missing something but all the doumentation I have found on ECS Auto Scaling Groups is that it should scale instances to fit the total resources requested by the desired amount of tasks. If I manually increase the desired capacity in the ASG and add an additional task that gets deployed on that new instance then the `CapacityProviderReservation` still remains at 100%. If I then remove that second task then after a while the ASG will scale in and remove the instance that no longer has any tasks running on it which is the expected behaviour. Any pointers would be greatly appreciated. As a side note this is all setup using the Python CDK. Edit: Clarified that the ASG is currently using CAS (as far as I can tell) and added details about scaling in working as expected Many thanks Tom
1
answers
0
votes
88
views
Tom-PH
asked 6 months ago

How to Configure stickiness and autoscaling in elasticbeanstalk application.

Hello, We have a application running on elasticbeanstalk that listens for client request and returns a stream segment. We have some requirements for application: 1) Client session should be sticky (all request for some session should go to same EC2) for specified time without any changes on client side. (we can't add cookie sending via client). As per my understanding application load balancer supports that and i enabled stickiness in load balancer. As per my understanding load balancer generated cookie are managed by load balancer and we do not need to send cookie through client side. 2) Based on CPU utilisation we need to auto scale instances, (when CPU load > 80%) we need to scale instances +1. Problem:- 1) When i request from multiple clients from same IP address. CPU load goes above 80% and new instance is launched. But after sometime i see CPU load going down . does this mean that 1 of these client are now connected to new instance and load is shared. That means stickiness is not working. Though It is not clear how to test it properly. However sometimes when i tried to stop new instance manually . No client has got any errors. When I stop first instance all client gets 404 error for sometime. How to check whether stickiness is working properly ? 2) If i get stickiness to work. As per my understanding Load will not be shared by new instance. So Average CPU usage will be same. So autoscaling will keep on launching new instance until max limit. How do i set stickiness with autoscaling feature. I set stickiness seconds to 86400 sec (24 hours) for safe side. Can someone please guide me how to configure stickiness and autoscaling proper way ?
3
answers
0
votes
56
views
asked 6 months ago