By using AWS re:Post, you agree to the Terms of Use

Unanswered Questions tagged with AWS Auto Scaling

Sort by most recent
  • 1
  • 12 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Auto-scaling is not working with Neptune cluster except when primary writer instance type db.r5d.xlarge.

Issue: Scale-up actions work fine with any instance size, the scale-in action is triggered by the CloudWatch but is not able to remove the readers except r5d.xlarge. I am trying to Auto-scale the Amazon Neptune database to meet workload demands, But I'm getting issues while the Neptune writer is r5d.xlarge then it's working fine but when I changed the writer instance size then it's not working. I did not set neptune_autoscaling_config on the cluster parameter group. Applied the same configuration which is in the reference blog post, but the one thing is different when I created the auto-scaling the first time was that time Writer instance was on r5d.xlarge. after that, I changed the writer instance size to t3.medium and then I deleted the old configuration of the auto-scaling app and scaling policy, deregistered the scale targets, and then created new everything about autoscaling, then after the scale-up action is working fine, but the scale-in action is not working except r5d.xlarge. I am not getting any error from the CloudWatch, CloudWatch action triggered successfully but it does not remove the Neptune reader which was created while the scale-up action, and also I'm not getting scaling activities which policy action is not able to delete the reader. This same thing is working fine on our Prod and Stage accounts but this issue has only occurred on the Dev account. Note: Scale-in(remove neptune reader) is working fine through a scheduled action https://docs.aws.amazon.com/autoscaling/application/userguide/examples-scheduled-actions.html Can anyone please help me with this? Thanks in advance! Below blog which I am using for reference https://aws.amazon.com/blogs/database/auto-scale-your-amazon-neptune-database-to-meet-workload-demands/
0
answers
0
votes
42
views
asked 2 months ago

Cloudformation - Autoscaling: how to get the summary (not average) of the metrics from all nodes?

I set my treshold to scale-up when cpu usage is 80% and scale-in when there is below 70% of usage. And the problem is that (AFAIK) for autoscaling group the average value is taken. Why its a problem? Example situation: 1. There is one node, i make 100% cpu load 2. Alarm is triggered, another instance is created 3. Now metric is divided by 2 so `(100% + 0%) / 2 = 50%` which is below 70% -> scale-in alarm is triggered and even though one node is still loaded with 100%, one node is being destroyed. Ideally for scale down i would use not average but SUMMARY of all loads on the nodes. There is `AWS::CloudWatch::Alarm/Properties/Statistic` settings with average or sum values but these are for Evaluation periods, not for ammount of factors in given dimension? https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-cw-alarm.html#cfn-cloudwatch-alarms-statistic my template ``` { "AWSTemplateFormatVersion":"2010-09-09", "Description" : "Creates Autoscaling group. Used securitygroup ids and subnets ids are hardcoded.", "Parameters" : { "myprojectAmiId": { "Description": "New AMI ID which will be used to create/update autoscaling group", "Type": "AWS::EC2::Image::Id" }, "myprojectNodesDefaultQuantity":{ "Type": "Number", "MinValue" : "1" } }, "Resources" : { "myprojectLaunchTemplate":{ "Type":"AWS::EC2::LaunchTemplate", "Properties":{ "LaunchTemplateData":{ "IamInstanceProfile":{ "Arn": "arn:aws:iam::censored6:instance-profile/myproject-ec2" }, "ImageId": { "Ref":"myprojectAmiId" }, "InstanceType" : "t3a.small", "KeyName" : "my-ssh-key", "SecurityGroupIds" : [ "sg-censored", "sg-censored", "sg-censored5", "sg-censored" ] } } }, "myprojectAutoScalingGroup": { "Type":"AWS::AutoScaling::AutoScalingGroup", "UpdatePolicy" : { "AutoScalingRollingUpdate" : { "MaxBatchSize" : "1", "MinInstancesInService" : "1", "PauseTime" : "PT5M", "WaitOnResourceSignals": "true" } }, "Properties": { "MinSize":{ "Ref":"myprojectNodesDefaultQuantity" }, "MaxSize":"3", "HealthCheckGracePeriod":300, "LaunchTemplate": { "LaunchTemplateId": { "Ref":"myprojectLaunchTemplate" }, "Version":{ "Fn::GetAtt":[ "myprojectLaunchTemplate", "LatestVersionNumber" ] } }, "VPCZoneIdentifier" : [ "subnet-censored", "subnet-0censoredc" ], "TargetGroupARNs" : [ "arn:aws:elasticloadbalancing:us-west-2:censored:targetgroup/autoscaling-tests-targetgroup/censored" ], "Tags" : [ {"Key" : "Name", "Value" : "myproject-cloudformation-ascaling-tests", "PropagateAtLaunch" : true}, {"Key" : "Stack", "Value" : "dev-staging","PropagateAtLaunch" : true}, {"Key" : "CreatedBy", "Value" : "cloudformation", "PropagateAtLaunch" : true} ] } }, "myprojectScaleUpPolicy":{ "Type" : "AWS::AutoScaling::ScalingPolicy", "Properties" : { "AdjustmentType" : "ChangeInCapacity", "AutoScalingGroupName" : { "Ref" : "myprojectAutoScalingGroup" }, "Cooldown" : "60", "ScalingAdjustment" : 1 } }, "myprojectScaleDownPolicy":{ "Type" : "AWS::AutoScaling::ScalingPolicy", "Properties" : { "AdjustmentType" : "ChangeInCapacity", "AutoScalingGroupName" : { "Ref" : "myprojectAutoScalingGroup" }, "Cooldown" : "60", "ScalingAdjustment" : -1 } }, "myprojectCPUAlarmHigh": { "Type" : "AWS::CloudWatch::Alarm", "Properties" : { "AlarmActions" : [ { "Ref" : "myprojectScaleUpPolicy" } ], "AlarmDescription" : "Scale-up if CPU > 80% for 5 minutes", "ComparisonOperator" : "GreaterThanThreshold", "Dimensions" : [ { "Name": "AutoScalingGroupName", "Value": { "Ref" : "myprojectAutoScalingGroup" }} ], "EvaluationPeriods" : 2, "MetricName" : "CPUUtilization", "Namespace" : "AWS/EC2", "Period" : 30, "Statistic" : "Average", "Threshold" : 80 } }, "myprojectCPUAlarmLow": { "Type" : "AWS::CloudWatch::Alarm", "Properties" : { "AlarmActions" : [ { "Ref" : "myprojectScaleDownPolicy" } ], "AlarmDescription" : "Scale-down if CPU < 70% for 10 minutes", "ComparisonOperator" : "LessThanThreshold", "Dimensions" : [ { "Name": "AutoScalingGroupName", "Value": { "Ref" : "myprojectAutoScalingGroup" }} ], "EvaluationPeriods" : 2, "MetricName" : "CPUUtilization", "Namespace" : "AWS/EC2", "Period" : 600, "Statistic" : "Average", "Threshold" : 70 } } } } ```
0
answers
0
votes
19
views
asked 6 months ago

EKS HPA's apiVersion fails to stay at v2beta2

When I deploy my HPA's I am choosing ``apiVersion: autoscaling/v2beta2`` but kubernetes is making them autoscaling/v2beta1 For example: If I deploy this. ``` apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: surething-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: surething minReplicas: 2 maxReplicas: 4 behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 100 periodSeconds: 15 scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 - type: Pods value: 4 periodSeconds: 15 selectPolicy: Max metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 100 ``` I will get this ``` apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: surething-hpa namespace: dispatch-dev uid: 189cee35-c000-410b-954e-c164a08809e1 resourceVersion: '404150989' creationTimestamp: '2021-04-04T17:30:48Z' labels: app: dispatch deployment: dev microservice: surething annotations:... selfLink:... status:... spec: scaleTargetRef: kind: Deployment name: surething apiVersion: apps/v1 minReplicas: 2 maxReplicas: 4 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 100 ``` All the documentation I can find on EKS and HPA's says that I should be able to use ``apiVersion: autoscaling/v2beta2``. My cluster is version 1.21 and my nodegroup is as well. When I run ``kubectl api-versions`` I can find ``autoscaling/v2beta2`` in the list. I'm at wits end on this one. Can someone tell me what I am doing wrong?
0
answers
0
votes
9
views
asked 6 months ago

How do I make an ECS cluster spawn GPU instances with more root volume than default?

I need to deploy an ML app that needs GPU access for its response times to be acceptable (since it uses some heavy networks that run too slowly on CPU). The app is containerized and uses an nvidia/cuda base image, so that it can make use of its host machine's GPU. The image alone weighs ~10GB, and during startup it pulls several ML models and data which takes up about another ~10GB of disk. We were previously running this app on Elastic Beanstalk, but we realized it doesn't support GPU usage, even if specifying a Deep Learning AMI, so we migrated to ECS, which provides more configurability that the former. However, we soon ran into a new problem: **selecting a g4dn instance type when creating a cluster, which defaults the AMI to an ECS GPU one, turns the Root EBS Volume Size field into a Data EBS Volume Size field.** This causes the instance's 22GB root volume (which is the only one that comes formatted and mounted) to be too small for pulling our image and downloading the data it needs during startup. The other volume (of whatever size I specify during creation in the new Data EBS Volume Size field) is not mounted and therefore not accessible by the container. Additionally, the g4dn instances come with a 125GB SSD, that is not mounted either. If either of these were usable or it was possible to enlarge the root volume (which it is if using the default non-GPU AMI) ECS would be the perfect solution for us at this time. At the moment, we worked around this issue by creating an *empty* cluster in ECS, and the manually creating and attaching an Auto Scaling group to it, since when using a Launch configuration or template the root volume's size can be correctly specified, even if using the same exact ECS GPU AMI as ECS does. However, this is a tiresome process, and makes us lose valuable ECS functionality such as automatically spawning a new instance during a rolling update to maintain capacity. Am I missing something here? Is this a bug that will be fixed at some point? If its not, is there a simpler way to achieve what I need? Maybe by specifying a custom launch configuration to the ECS cluster or by automatically mounting the SSD on instance launch? Any help is more than appreciated. Thanks in advance!
0
answers
0
votes
32
views
asked 8 months ago
  • 1
  • 12 / page