How do I use a Lambda function to receive SNS alerts when an AWS Glue job fails on retry?

5 minuto de leitura
0

I want to use an AWS Lambda function to receive an Amazon Simple Notification Service (Amazon SNS) alert when my AWS Glue job fails a retry.

Short description

You can create a Lambda function that checks the incoming event for a specific string. If the string in the event matches the string in the Lambda function, then the function publishes a message to Amazon SNS.

To receive an Amazon SNS notification when an AWS Glue job fails after a retry, create an Amazon SNS topic and subscription. Then, create a Lambda function. Use Amazon EventBridge to initiate email notifications.

Resolution

Prerequisites:

  • An existing AWS Glue extract, transform, and load (ETL) job.
  • An AWS Identity and Access Management (IAM) role for Lambda that includes permission to publish Amazon SNS notifications.

Create an Amazon SNS topic and subscription

Complete the following steps:

  1. Open the Amazon SNS console.
  2. Choose Topics, and then choose Create topic.
  3. For Type, choose Standard.
  4. For Name, enter a name for your topic.
  5. (Optional) For Display name, enter the display name for your topic.
  6. Choose Create topic.
  7. On the Topic page, choose Create subscription, and then complete the following steps: 
    For Topic ARN, choose your topic. 
    For Protocol, choose the notification method, such as Email.
    For Endpoint, enter the address where you want to receive the Amazon SNS notifications.
  8. Choose Create subscription.

Create an AWS Lambda function

Complete the following steps:

  1. Open the Lambda console.

  2. Choose Create function.

  3. On the Create function page, complete the following steps:
    Select Author from scratch.
    For Function name, enter a name for your function.
    For Runtime, choose an available Python version.
    Expand the Change default execution role dropdown list.
    For Execution role, choose Use an existing role.
    For Existing role, select the IAM role with permission to send Amazon SNS notifications.

  4. Choose Create function.

  5. On the Code tab, in the Code source section, choose File, and then choose New file.

    Name the file, and then enter the following code:

    import json
    import logging
    import boto3
    
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    
    client = boto3.client('sns')
    
    SNS_TOPIC_ARN = "REPLACE_WITH_YOUR_SNS_TOPIC_ARN"
    
    def lambda_handler(event, context):
        logger.info(f"## INITIATED BY EVENT: \n{event['detail']}")
    
        glue_job_name = event['detail']['jobName']
        jobrun_id = event['detail']['jobRunId']
    
        if jobrun_id.endswith('_attempt_1'):
            logger.info(f'## GLUE JOB FAILED RETRY: {glue_job_name}')
            message = (
                f"A Glue job failed after retrying.\n"
                f"Job name: {glue_job_name}\n"
                f"JobRun ID: {jobrun_id}"
            )
            client.publish(
                TargetArn=SNS_TOPIC_ARN,
                Message=json.dumps({'default': json.dumps(message)}),
                Subject='AWS Glue Job Retry Failure Notification',
                MessageStructure='json'
            )

    Note: Replace SNS_TOPIC_ARN with the ARN of your Amazon SNS topic.

  6. Choose File, and then choose Save.

  7. For Filename, enter a file name.

  8. Choose Deploy.

(Optional) To test the event, complete the following steps:

  1. Choose the Test tab.

  2. For Event name, enter the name.

    In the event's JSON body, enter the following code:

    {    "version": "0",
        "id": "abcdef01-1234-5678-9abc-def012345678",
        "detail-type": "Glue Job State Change",
        "source": "aws.glue",
        "account": "123456789012",
        "time": "2017-09-07T06:02:03Z",
        "region": "us-west-2",
        "resources": [],
        "detail": {
            "jobName": "MyTestJob",
            "severity": "ERROR",
            "state": "FAILED",
            "jobRunId": "jr_0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef_attempt_1",
            "message": "JobName:MyTestJob and JobRunId:jr_0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef failed to execute with exception Role arn:aws:iam::123456789012:role/Glue_Role should be given assume role permissions for Glue Service."
        }
    }

    Note: Replace MyTestJob with your AWS Glue job name.

  3. Choose Save changes, and then choose Test.

  4. View the Execution result that appears.

  5. Verify that you receive an Amazon SNS alert.

Use Amazon EventBridge to initiate email notifications

Complete the following steps:

  1. Open the EventBridge console.

  2. In the navigation pane, choose Rules, and then choose Create rule.

  3. On the Create rule page, complete the following steps:
    For Name, enter the rule name.
    (Optional) For Description, enter the rule's description.
    For Define pattern, choose Event pattern.
    For Event matching pattern, choose Custom pattern.
    For Event pattern, enter the following pattern or your own:

    {  
    "detail-type": [ "Glue Job State Change"],
      "source": ["aws.glue"],
      "detail": {
        "state": ["FAILED"]
      }
    }
  4. Choose Save.

  5. On the Select targets section, complete the following steps:
    For Target, choose Lambda function.
    For Function, choose your Lambda function.

  6. Choose Next, and then choose Create rule.

Test a failure with your AWS Glue job

Complete the following steps:

  1. Open the AWS Glue console.
  2. In the navigation pane, choose Jobs, and then select the AWS Glue job that you want to test.
  3. Choose the Action dropdown list, and then choose Edit job.
  4. Expand Security configuration, script libraries, and job parameters (optional).
  5. Under Security configuration, for Number of retries, enter 1.
  6. Choose Save.
  7. On the Jobs page, select the AWS Glue job.
  8. Choose the Action dropdown list, and then choose Edit script.
  9. Change an element of your code for your job to fail. For example, add "_BROKEN" to a table name.
  10. Choose Save.
  11. On the Jobs page, select the AWS Glue job.
  12. Choose the Action dropdown list, and then choose Run job.

After the AWS Glue job fails a second time, verify that you receive the Amazon SNS notification. Check your configured endpoint for a notification. Then, remove the element that you changed in your code for your job to fail.

Related information

Events in Amazon EventBridge

Setting up Amazon SNS notifications

Troubleshooting Amazon EventBridge

AWS OFICIAL
AWS OFICIALAtualizada há um mês
2 Comentários

Any tips to setup integration between AWS and Servicenow? For e.g to open a servicenow ticket automatically when a AWS Glue job fails

respondeu há um ano

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERADOR
respondeu há um ano