- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
Hi m0ltar,
Few questions:
- Is this a custom metric or AWS Service metric? and whether it emits values when the process is not running?
- What value does it put when it fails and when it succeed?
- How long does the Process run? and what will be the timestamp of the metric for that run of the process? Time it Failed/Succeeded or Time it started?
- Can you please elaborate further for
How do I get an alarm on failure, but clear it once the process succeeds
?
Generally we can utilize the combination of Alarm Evaluation period of M out of N in this scenario, and most importantly we have to define a feasible Period for the Alarm. Further we can configure Treat Missing Data or use FILL() metric math functions to keep the Alarm away from flapping to Insufficient Data.
Happy to discuss further if you can provide more information to your use case.
Is this a custom metric or AWS Service metric? and whether it emits values when the process is not running?
The alarm is for step function execution failures.
What value does it put when it fails and when it succeed?
Metric name is ExecutionsFailed
How long does the Process run?
Up to 10 minutes
and what will be the timestamp of the metric for that run of the process?
Time it the Step Function had failed.
Can you please elaborate further for
How do I get an alarm on failure, but clear it once the process succeeds?
- I run the step function and it fails
- Alarm is raised, notification is sent via SNS
- I go in and fix whatever caused the failure, and manually re-run the step function
- Step function execution succeeds
- "In Alarm" screen shall not list the alarm, since the execution has finally succeeded
The last step is what is not clear to me.
Thanks!
Contenuto pertinente
- AWS UFFICIALEAggiornata 3 anni fa
- AWS UFFICIALEAggiornata 2 anni fa
Checking here https://docs.aws.amazon.com/step-functions/latest/dg/procedure-cw-metrics.html#cloudwatch-step-functions-execution-metrics, the Step Function will produce "ExecutionsFailed" metric when the execution fails and "ExecutionsSucceeded" metric when then execution succeeds. These 2 metrics are the opposite of each other.
Standard CloudWatch Alarms monitors single Metric value or single value created by Metric Math expression. Since you are monitoring "ExecutionsFailed" metric, the Alarm can be raised if this metric reports value greater than 0, then once the function/process runs again and succeeds it will report 0 to "ExecutionsFailed" and report count of 1 to "ExecutionsSucceeded" at the same time. So there is no relevance to "ExecutionsSucceeded" metric value with the Alarm that is monitoring "ExecutionsFailed" metric.
Further, the configurations of the Alarm will control how much period is to be monitored to trigger the Alarm, and at the same time how long the alarm will be in Alarm state considering the breaching data point stays within the Evaluation period/Evaluation Range. Once the breaching datapoint is outside the Evaluation Period/Evaluation Range the Alarm will transition back to OK state considering criteria is met and other configurations are in place.
Needs more clarification on step 5.
Ok, I am sorry, but I am not following. Are you saying I need to create a math expression and add counts for ExecutionsFailed and ExecutionsSucceeded?