By using AWS re:Post, you agree to the AWS re:Post Terms of Use

Using Bedrock Agent For CloudWatch Alarm to quickly get started with AWS service Alerts

10 minute read
Content level: Advanced
0

For new AWS users, what's the quickest way to start configuring CloudWatch Alarms?

How do we select the right metrics for AWS managed services to monitor?

Bedrock Agent For CloudWatch Alarm:

Enable customers to learn Alarm configuration through natural language interaction, and quickly set up alarms.

Simplifying the CloudWatch Alarm configuration process.

Automate alarm creation, including metric selection, threshold setting, and evaluation period. Users can view and edit created alarms.

1.Tool Introduction

ToolBedrock Agent For CloudWatch Alarm
Use CaseEmpower customers to easily learn and configure CloudWatch Alarms using natural language
Function1Provide recommendations for alarm configuration and explanations
Function2Create CloudWatch alarms with interactive modifications
Function3Send metric test data,notify to sns、chime
Function4Provide similar AWS CLI commands for creating alarms
Limitation1Only supports creation of single metric and dimension, batch creation is not supported
Limitation2Validation of metric accuracy is not supported, manual verification is required
Used ServiceBedrock Agent,Claude 3 haiku,Lambda,CloudWatch Alarm&Logs,SNS

2.Partial usage examples

01

3.Solution architecture

  • Deployed in a serverless architecture, which helps reduce costs and lower the operational overhead

  • Leverages the Bedrock Agent to provide ReAct capabilities (reasoning + action) along with an event-driven approach, enabling auto scaling

  • Alarm testing and notification sending is done via an SNS message queue. This allows connecting to multiple subscription channels like email, instant messaging etc. In this particular solution, the Chime webhook can be replaced as per actual requirements

02

4.Solution deployment

4.1Create SNS and subscription

03

  • Access policy allows CloudWatch to publish messages, replace the resource ARN with the correct account ID in the ARN.

  • Creating subscriptions:

    If you want to send to email, select the "email" protocol. If you want to send to Chime or similar instant messaging, select the "LAMBDA" protocol. You will need to first create a Lambda function.

  • Creating the Lambda function to send messages to Chime:

    Runtime should be Python. Use the provided sample code for the function.

import urllib3
import json

http = urllib3.PoolManager()


def lambda_handler(event, context):
    url = "https://hooks.chime.aws/incomingwebhooks/*********?token=********"
    msg = {"Content": event["Records"][0]["Sns"]["Message"]}
    encoded_msg = json.dumps(msg).encode("utf-8")
    resp = http.request("POST", url, body=encoded_msg)
    print(
        {
            "message": event["Records"][0]["Sns"]["Message"],
            "status_code": resp.status,
            "response": resp.data,
        }
    )
  • Replace the url in the code with the Chime webhook URL. You can create this webhook from the Chime chat room and copy-paste it into the code.

  • Modify the Lambda function configuration and set the timeout to 30 seconds.

04

  • Select the SNS topic that was just created as the trigger for the Lambda function:

05

4.2Create Bedrock Agent

This solution utilizes the Claude 3 Haiku model. Please ensure the access permissions for the Claude 3 Haiku model are granted in the Bedrock model access settings. Create Agent:

06

  • Agent iam role:Create a new service role

  • Model:Anthropic - Claude 3 Haiku

  • Others:User Input - choose ‘Enable’

  • Agent Instructions:**Please make sure to replace the "SNS" ARN in the following content with the one you just created, which should look similar to ** 'arn:aws:sns:us-east-1:******:cw-alarm', there are two where this needs to be updated

Your role is to create CloudWatch alarms for AWS services. Please provide detailed information about the service and metric. Generate the AWS CLI command if needed. Directly execute the necessary functions to create the alarm, don't need to return control back to the agent.
<thinking>
Determine the user's needs;Gather user requirements;Collect and Assemble alarm parameters;Create alarm;(Optional) Send metric test data to trigger the alarm
</thinking>
process step-by-step:
<step0>
Determine the user's needs: 1. If user want AWS CLI command,skip to analyzing their needs and generating the command directly 2. If user want to send test data,find the newly created alert and skip to step5 3.If user asks for suggested metrics to set up alerts for AWS services, or inquires about the available metrics for a specific service, i will skip step1-5 and provide a list of commonly monitored metrics along with their explanations ,response to user.
</step0>
<step1>
Understand what the user wants to monitor in AWS
<example>
User: I want to set up alerts for my EC2 instances
Agent: Let's find the right metrics. Would you like to monitor CPU usage, network traffic, or something else?
</example>
</step1>
<step2>
Specify the exact dimension (e.g., instance ID for EC2) and the desired threshold (e.g., CPU utilization > 80%). Do not provide "AlarmActions" and "OKActions" parameters; I'll set default values.
<example>
User: I want to be alerted when my EC2 instance's CPU gets too high
Agent: Which specific EC2 instance are we monitoring? And at what percentage CPU usage should we trigger the alert?
</example>
</step2>
<step3>
Construct alarm parameters based on user-specified metrics and CloudWatch Alarm requirements
<param>
Prepare the alarm parameters in JSON format, referring to the provided example. Pass the formatted JSON string to the put_alarm_data parameter.
<example>
put_alarm_data = {        
'AlarmName': 'CPUUtilizationAlarm_$INSTANCE_ID',         
'MetricName': 'CPUUtilization',    
'Threshold': 70,
ComparisonOperator='GreaterThanThreshold',
'Dimensions':[{'Name': 'InstanceId','Value': '$INSTANCE_ID'}],
'AlarmActions':['arn:aws:sns:us-east-1:******:cw-alarm'],
...
}
</example>
AlarmName: should be a combination of the metric name and the dimension ID (e.g., instance ID). The dimension ID, represented by $INSTANCE_ID, is a variable that must be provided by the user. If the metric does not require a dimension ID, generate a 6-digit random string and append it to the metric name.
Dimensions: typically EC2 instance IDs, RDS instance IDs, etc. If the metric has a Dimensions field, the dimension value must be provided and cannot be fabricated.
AlarmActions: a default value of 'arn:aws:sns:us-east-1:******:cw-alarm'. Ensure this default value is passed to the function and should not be provided by the user.
OKActions: Do not provide a value for the OKActions parameter. It should be left empty.
Other parameters: Refer to the CloudWatch put metric alarm API for additional parameters.
</param>
Do not use fictitious values like $variable, replace all variables with actual values
</step3>
<step4>
Use the put-metric-alarm function to create CloudWatch alarm based on assembled parameters. If creation fails, guide the user to use the AWS CLI for troubleshooting.
</step4>
<step5>
(Optional) Send metric test data to CloudWatch based on the newly created alert.This data should be a metric and its value, designed to exceed the alarm's threshold. For instance, if the alarm triggers at 70% CPU utilization, send a value like 80%. Consider the alarm's EvaluationPeriods setting; multiple data points might be needed if it's greater than 1.
<param>
Prepare the alarm parameters in JSON format, referring to the provided example. Pass the formatted JSON string to the put_metric_data parameter.
<example>
put_metric_data = {                
        Namespace='AWS/EC2',
        MetricData=[
            {
                'MetricName': 'CPUUtilization',
                'Dimensions': [
                    {
                        'Name': 'InstanceId',
                        'Value': 'i-1234567890abcdef0'
                    },
                ],
                'Value': 90,
                'Unit': 'Percent'
            },
        ]
}
</example>
MetricData: metric data sent to CloudWatch
MetricName: align with this used when creating the alarm
Dimensions: align with those used when creating the alarm
Other parameters: refer to the CloudWatch put metric data API for additional parameters.
</param>
If it fails, provide the AWS CLI command and recommend using the CLI for testing.
</step5>

Click Save.

  • Action Group:add

07

  • Choose create a new Lambda function, and then modify that function

  • Add two functions in the action group:

  • Function1: put-metric-alarm Create a CloudWatch Alarm

08

  • Function2: put-metric-data Send test metric data

09

Click save。

  • After automatically creating the function, review the Lambda function in the action group and make code modifications to that function

010

  • Code reference as follows:
import boto3
import json

def put_metric_alarm(put_alarm_data):
    print('put-metric-alarm')
    cloudwatch = boto3.client('cloudwatch')
    alarm_data = json.loads(put_alarm_data)
    try:
        response = cloudwatch.put_metric_alarm(**alarm_data)
        print(response)
        return {
            'statusCode': 200,
            'msg': 'alarm created success'
        }
    except Exception as e:
        return {
            'statusCode': 500,
            'msg': f'alarm created fail, fail msg: {str(e)}'
        }

def put_metric_data(put_metric_data):
    print('put_metric_data')
    cloudwatch = boto3.client('cloudwatch')
    metric_data = json.loads(put_metric_data)
    try:
        response = cloudwatch.put_metric_data(**metric_data)
        print(response)
        return {
            'statusCode': 200,
            'msg': 'send metric data success'
        }
    except Exception as e:
        return {
            'statusCode': 500,
            'msg': f'send metric data fail,fail msg: {str(e)}'
        }


def lambda_handler(event, context):
    agent = event['agent']
    actionGroup = event['actionGroup']
    function = event['function']
    parameters = event.get('parameters', [])

    print('event:{}'.format(event))
    params = {
        'put_alarm_data': '',
        'put_metric_data': ''
    }
    for param in parameters:
        if param['name'] == 'put_alarm_data':
            params['put_alarm_data'] = param['value']
        elif param['name'] == 'put_metric_data':
            params['put_metric_data'] = param['value']

    print('params:{}'.format(params))

    response =  {}

    
    if function == 'put-metric-alarm':
        response = put_metric_alarm(params['put_alarm_data'])
    elif function == 'put-metric-data':
        response = put_metric_data(params['put_metric_data'])

    responseBody =  {
        "TEXT": {
            "body": json.dumps(response)
        }
    }
    print(responseBody)

    action_response = {
        'actionGroup': actionGroup,
        'function': function,
        'functionResponse': {
            'responseBody': responseBody
        }
    }
    dummy_function_response = {'response': action_response, 'messageVersion': event['messageVersion']}
    print("Response: {}".format(dummy_function_response))

    return dummy_function_response

After modifying the code, click save and deploy to deploy。

  • Modify the function configuration, set the timeout to 30s

  • Modify the execution role permissions of the function, grant CloudWatch permissions

011

012

4.3Test Agent

  • Click 'Edit in Agent Builder'

013

  • Confirm the information in the agent, including the model selection, description, other settings, and action group. If everything is correct, first click 'Save', then click 'Prepare'.

014

  • Conduct testing on the right side of the page

015

  • You can refer to the following instructions to conduct the test. After the agent responds, please interact:

Suggest which metrics to set up alarms for Lambda?

Create an alarm for Lambda errors

What alarms do you suggest creating for RDS?

What are the corresponding metrics for these alarms?

Create an alarm for RDS database connections

For testrds1, create an alarm when connections exceed 1000

Create an EC2 alarm

Create an alarm for EC2 instance i-0123456789 system status check failure

Sending test data

I would like the evaluation period for this alarm to be changed to continuous 2 minutes

What is the similar AWS CLI command to create this alarm?

Create an alarm for EC2 system status check failure, just provide the AWS CLI, no need to create it
  • Customers can also choose UI tools like Streamlit to test

016

  • The alarms that are created can be viewed on the CloudWatch Alarms page, and the alert channel messages can be checked by testing the data based on the metrics automatically generated by the Agent.

017

18

  • Afterwards, you can view and edit the automatically created Alarms to meet the actual production requirements

019

5.Summary

The Bedrock Agent provides large language model reasoning and action capabilities, allowing it to automatically complete business operations based on customer requirements. This demonstration showcased the Bedrock Agent for CloudWatch Alarm, which built an intelligent alarm assistant. The entire solution is deployed in a serverless architecture, reducing costs while lowering the operational workload. Through cost estimation, $1 can automatically configure 200-400 alarm settings.

Through this tool, customers can quickly get started with Alarm alert configuration, and after generating alerts, they can view and edit the alerts in the console to meet actual production requirements.