Glue glue.driver.aggregate.numFailedTasks not reporting correctly

0

we are using something similar to the following lambda function and collecting glue customer metrics as per the article : https://medium.com/@ettefette/metrics-for-aws-glue-jobs-as-you-know-them-from-lambda-functions-e5e1873c615c

But we see different number of failed tasks in Glue console comparing to what Cloudwatch metrics reporting when trying to find Cound ('Glue glue.driver.aggregate.numFailedTasks')

def handler(event, context): job_name = event["detail"]["jobName"] job_run_id = event["detail"]["jobRunId"]

cloudwatch = boto3.client("cloudwatch", region_name="eu-central-1")

if event["detail-type"] == "Glue Job State Change":
    job_status = event["detail"]["state"]

    if job_status not in ["SUCCEEDED", "FAILED", "TIMEOUT", "STOPPED"]:
        raise AttributeError("Job state is not supported.")

    if job_status == "SUCCEEDED":
        metric_value = 1.0
    else:
        metric_value = 0.0

    cloudwatch.put_metric_data(
        MetricData=[
            {
                "MetricName": "JobStatus",
                "Dimensions": [
                    {"Name": "JobName", "Value": job_name},
                    {"Name": "JobRunId", "Value": job_run_id},
                    {"Name": "JobStatus", "Value": job_status},
                ],
                "Unit": "None",
                "Value": metric_value,
            }
        ],
        Namespace="Glue",
    )

=======================

Any ideas ?

asked a year ago46 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions