Need a metric in AWS Glue for Glue failed jobs


What is the cloud watch metric for failed glue jobs/runs?

Simply put I need a metric for failed glue jobs/runs.

The GlueExceptionAnalysisListener seems to be the only thing capturing failed glue jobs/runs. And job/run failures are still not simple to find within Cloudwatch let alone Glue as a metric.

Looking for something like this.

glue.X.executor.failedjobs glue.X.executor.failedruns and glue.X.executor.completedjobs glue.X.executor.completedruns

Ultimately, I am looking to pipe this into a third party Platform Observability tool

I need something that is in line with the following metrics:Glue Metrics

answered 5 months ago
reviewed 19 days ago

Normally you don't expect to regularly have failed job, instead you alarm when jobs fail.
If you want to create that metric, you could in EventBridge trigger a lambda when a job ends and update a metric depending on the outcome.

answered 6 months ago
  • Interesting are there any other methods of completing this?

    Again, we are trying to get this into our Third Party Platform Obs as a metric and it would be used as an emergency type of metric too wake up the troops to look into the issue. I understand it is extremely rare that jobs/runs fail but that is even more reason we would like the metric.

  • An EventBridge rule is more timely and actionable that any metric but if you want to do something for complex like (if it fails x times over period y) you could use that metric, for that you would have to build it yourself from the rule action (e.g. calling a lambda)

