- Newest
- Most votes
- Most comments
Hello,
I would like to inform AWS Glue profiles and sends the following metrics to CloudWatch every 30 seconds, and the AWS Glue Metrics Dashboard report them once a minute. So metrics start appear after one minute not after running the entire job. Hence you can use glue.driver.aggregate.elapsedTime metrics to to determine how long it takes a job run to run on average and enable notification using cloudwatch alarm and SNS.
Apart from this you can use Glue job inbuilt "Delay notification threshold" option in which If the job runs longer than the specified time Glue will send a delay notification via CloudWatch. [1][2]
--Reference:
[1] https://docs.aws.amazon.com/glue/latest/dg/console-jobs.html
[2] https://docs.amazonaws.cn/en_us/glue/latest/dg/automating-awsglue-with-cloudwatch-events.html
Thank you I tried and was able to verify that this "Delay notification threshold" option is working and was able to send an email via SNS using it. Much thanks!
Relevant content
- Accepted Answerasked 5 months ago
- asked 10 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 2 years ago
This is contrary to my experience. We have set our cloudwatch metric alarms (in terraform) with the following variations and none of them are triggering. The metrics appear after job completion. namespace = "Glue" metric_name = "glue.driver.aggregate.elapsedTime" comparison_operator = "GreaterThanThreshold" datapoints_to_alarm = "1" evaluation_periods = "1" period = "60" and "300" threshold = "1000" statistic = "Sum" and "Maximum"
dimensions = { JobName = "my_glue_job_name" or blank Type = "count" or "gauge" Value = "ALL" or blank }
The glue job has configuration to: create_event_alarms = "true" glue_version = "3.0" worker_type = "G.2X" alarm_actions = [module.sns_code.topic_arn] ok_actions = [module.sns_code.topic_arn] insufficient_data_actions = [module.sns_code.topic_arn]
And we are using the GlueContext: sc = SparkContext(conf=config) glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) Options: "--enable-continuous-cloudwatch-log" = "true" "--enable-metrics" = "true"
Is the issue due to using Glue version 3.0?
Thanks for the suggestion on "Delay notification threshold" I will test this functionality today.