- 新しい順
- 投票が多い順
- コメントが多い順
A cloudwatch metric is a unique combination of a Namespace, MetricName, Dimension(s), and Unit
Some metrics won't be pushed with any Dimensions or a Unit, but when they are, the full list must match exactly, or its a different metric. Sagemaker pushes these metrics with EndpointName + VariantName as 2 dimensions on the metric: https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html#cloudwatch-metrics-endpoint-invocation
With Step Scaling, you are in charge of creating the alarm for the policy. With target tracking, you define the policy and AutoScaling will create/manage the alarms for you. Make sure the alarm is going into the ALARM state, and that its action actually got created correctly (pointing at the step scaling policy).
Generally you'll create 2 alarms. One linked to a step scale-in policy; and a second linked to a step scale-out policy (whereas with target tracking, you create a single policy which handles both scale-in and scale-out). This is likely why you're not seeing scaling happen as expected, since the alarm action is only triggering when usage is above 70; and the policy is set to -1 when that happens. The +1 adjustment on that policy will never trigger, since its set to happen when CPU is greater than 130% (step upper/lower bounds are relative to the alarm threshold) https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-step-scaling-policies.html#as-scaling-steps
Hi, did you try with a longer period or higher number of datapoints to alarm? If the alarm remains in insufficient data although you can see the metric's graph on the alarm detail page, it may be that the metric is ingested with a slightly higher delay than you expect, so the alarm doesn't see fresh data when it evaluates. If that is the case, you can workaround it either by using a higher evaluation period, or by wrapping the metric in a FILL(m1, REPEAT) metric math function (see https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html). Whether that helps you or not, I'd also suggest you raise this as an issue to support.
Hey thanks for the reply; so I just made a hack, I copied the alarm from the one that's auto-generated from TargetTrackingScaling and edit it to be triggering the policy for StepScalingPolicy (I don't know if this even makes sense).
But anyway the new alarm was able to get the CPU usage data and stay in 'in alarm' / 'OK' state as expected.
The issue now is that I still don't see new instances being kicked off even when the alarm is in 'in alarm' state, am I supposed to NOT manually create this alarm at all? Or how else should StepScalingPolicy work..
Thanks! I was able to fix it by creating 2 alarms manually, each maps to one policy, following your advice!
step upper/lower bounds are relative to the alarm threshold -> I wasn't aware this fact apparently.