Skip to content

Monitor Dynamic Prefixes advertised from Virtual Router Appliances to Transit Gateway Connect Peers Using CloudWatch

14 minute read
Content level: Advanced
1

This article details how to leverage AWS CloudWatch and serverless Compute with AWS Lambda to monitor the number of BGP prefixes advertised by a virtual appliance to a Transit Gateway Connect Peer

AWS Transit Gateway Connect enables customers to connect SD-WAN and virtual router appliances to AWS using GRE tunnels and BGP, simplifying network connectivity without IPsec VPNs. To operate these connections effectively, customers need visibility into the number of prefixes advertised by virtual appliances to Transit Gateway Connect peers. Without this visibility, several risks can arise:

  • Customers can unknowingly exceed the default service quota of 1,000 prefixes per Connect peer
  • Transit Gateway route tables only display a maximum of 1,000 routes in the console, masking potential issues
  • Without proactive monitoring, prefix limit violations can cause route drops and connectivity failures
  • Manual verification through CLI or API calls is time-consuming and not scalable

This article will demonstrate how to automate monitoring of dynamic prefix counts using CloudWatch metrics, AWS Lambda, EventBridge, and S3.

Solution Benefits

This automated solution provides:

  • Proactive Monitoring: Real-time visibility into prefix counts before reaching service quotas
  • Scalability: Automatically discovers and monitors all Connect peers across Transit Gateways
  • Alerting Capability: Integration with CloudWatch Alarms for threshold-based notifications
  • Operational Efficiency: Eliminates manual monitoring and reduces troubleshooting time
  • Capacity Planning:
    • Identify growth trends in route advertisements

    • Plan for quota increases before reaching limits

    • Optimize routing policies based on historical patterns

Solution Overview

This solution leverages serverless AWS services to automatically monitor and publish prefix counts to CloudWatch. It is packaged as an AWS CloudFormation template which deploys a Lambda function triggered every 5 minutes via EventBridge to monitor Transit Gateway Connect peer route counts.

Figure 1. Event processing architecture diagram Figure 1. Event processing architecture diagram

  • Automated Discovery: The Lambda function discovers all Transit Gateway Connect attachments in the region and identifies associated Connect peers
  • Route Export: Exports propagated routes from Transit Gateway route tables to S3 using the ExportTransitGatewayRoutes API with filters for active, propagated routes from Connect attachments
  • Prefix Counting: Parses exported route data to count unique prefixes advertised by each Connect peer
  • Metric Publishing: Publishes prefix counts to CloudWatch under the custom namespace TgwConnectPropagatedRouteCount with dimensions for Connect Attachment ID and Connect Peer ID
  • Automatic Cleanup: S3 bucket lifecycle policy deletes exported route files after 1 day to minimize storage costs

Note: The solution includes an IAM policy granting necessary permissions to describe Transit Gateways, export routes, manage the S3 bucket, and publish CloudWatch metrics.

Deployment Guide

Download the cloudformation template below and follow these steps to deploy the solution:

AWSTemplateFormatVersion: '2010-09-09'
Description: >
  CloudFormation template for Transit Gateway Connect Peer Route Count Metrics.
  Deploys a Lambda function that discovers TGW Connect peers, counts propagated routes,
  and publishes metrics to CloudWatch every 5 minutes.

# =============================================================================
# Parameters
# =============================================================================
Parameters:
  PublishMetrics:
    Type: String
    Default: 'true'
    AllowedValues:
      - 'true'
      - 'false'
    Description: Whether to publish metrics to CloudWatch
  
  CleanupBucket:
    Type: String
    Default: 'true'
    AllowedValues:
      - 'true'
      - 'false'
    Description: Whether to delete exported route files from S3 after processing
  
  LambdaTimeout:
    Type: Number
    Default: 300
    MinValue: 60
    MaxValue: 900
    Description: Lambda function timeout in seconds
  
  LambdaMemorySize:
    Type: Number
    Default: 256
    AllowedValues:
      - 128
      - 256
      - 512
      - 1024
    Description: Lambda function memory size in MB

# =============================================================================
# Resources
# =============================================================================
Resources:

  # ---------------------------------------------------------------------------
  # S3 Bucket for Route Exports
  # ---------------------------------------------------------------------------
  RouteExportBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Sub 'connect-peer-propagated-routes-${AWS::AccountId}-${AWS::Region}'
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - ServerSideEncryptionByDefault:
              SSEAlgorithm: AES256
      PublicAccessBlockConfiguration:
        BlockPublicAcls: true
        BlockPublicPolicy: true
        IgnorePublicAcls: true
        RestrictPublicBuckets: true
      LifecycleConfiguration:
        Rules:
          - Id: DeleteExportedFilesAfter1Day
            Status: Enabled
            ExpirationInDays: 1
      Tags:
        - Key: Purpose
          Value: TGW-Connect-Route-Export

  # ---------------------------------------------------------------------------
  # IAM Role for Lambda Function
  # ---------------------------------------------------------------------------
  LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Sub 'TgwConnectRouteMetrics-LambdaRole-${AWS::Region}'
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: TgwConnectRouteMetricsPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              # EC2 Permissions for Transit Gateway
              - Sid: TransitGatewayReadAccess
                Effect: Allow
                Action:
                  - ec2:DescribeTransitGateways
                  - ec2:DescribeTransitGatewayRouteTables
                  - ec2:GetTransitGatewayRouteTablePropagations
                  - ec2:ExportTransitGatewayRoutes
                Resource: '*'
              
              # S3 Permissions for Route Export Bucket
              - Sid: S3BucketAccess
                Effect: Allow
                Action:
                  - s3:GetObject
                  - s3:ListBucket
                  - s3:DeleteObject
                  - s3:PutObject
                Resource:
                  - !GetAtt RouteExportBucket.Arn
                  - !Sub '${RouteExportBucket.Arn}/*'
              
              # CloudWatch Permissions for Publishing Metrics
              - Sid: CloudWatchMetricsAccess
                Effect: Allow
                Action:
                  - cloudwatch:PutMetricData
                Resource: '*'
                Condition:
                  StringEquals:
                    cloudwatch:namespace: TgwConnectPropagatedRouteCount

  # ---------------------------------------------------------------------------
  # Lambda Function
  # ---------------------------------------------------------------------------
  TgwConnectRouteMetricsFunction:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: TgwConnectRouteMetrics
      Description: >
        Discovers TGW Connect peers, counts propagated routes, and publishes metrics to CloudWatch
      Runtime: python3.12
      Handler: index.lambda_handler
      Role: !GetAtt LambdaExecutionRole.Arn
      Timeout: !Ref LambdaTimeout
      MemorySize: !Ref LambdaMemorySize
      Environment:
        Variables:
          S3_BUCKET: !Ref RouteExportBucket
          PUBLISH_METRICS: !Ref PublishMetrics
          CLEANUP_BUCKET: !Ref CleanupBucket
      Code:
        ZipFile: |
          import boto3
          import json
          import os
          from urllib.parse import urlparse
          from datetime import datetime
          from botocore.exceptions import ClientError

          # =============================================================================
          # Constants
          # =============================================================================
          CLOUDWATCH_NAMESPACE = 'TgwConnectPropagatedRouteCount'
          METRIC_NAME = 'PropagatedRouteCount'

          # =============================================================================
          # AWS API Functions
          # =============================================================================
          def get_all_transit_gateways(ec2_client):
              """Get all Transit Gateways in the region."""
              transit_gateways = []
              try:
                  paginator = ec2_client.get_paginator('describe_transit_gateways')
                  
                  for page in paginator.paginate():
                      for tgw in page.get('TransitGateways', []):
                          if tgw.get('State') == 'available':
                              transit_gateways.append(tgw.get('TransitGatewayId'))
              except Exception as e:
                  print(f"Error getting transit gateways: {e}")
              
              return transit_gateways

          def get_tgw_route_tables(ec2_client, transit_gateway_id):
              """Get all route tables for a specific Transit Gateway."""
              route_tables = []
              try:
                  paginator = ec2_client.get_paginator('describe_transit_gateway_route_tables')
                  
                  for page in paginator.paginate(
                      Filters=[
                          {'Name': 'transit-gateway-id', 'Values': [transit_gateway_id]},
                          {'Name': 'state', 'Values': ['available']}
                      ]
                  ):
                      for rt in page.get('TransitGatewayRouteTables', []):
                          route_tables.append(rt.get('TransitGatewayRouteTableId'))
              except Exception as e:
                  print(f"Error getting route tables for {transit_gateway_id}: {e}")
              
              return route_tables

          def get_connect_attachments_propagating_to_route_table(ec2_client, route_table_id):
              """Get Connect attachments that are propagating to a specific route table."""
              connect_attachments = []
              try:
                  paginator = ec2_client.get_paginator('get_transit_gateway_route_table_propagations')
                  
                  for page in paginator.paginate(TransitGatewayRouteTableId=route_table_id):
                      for propagation in page.get('TransitGatewayRouteTablePropagations', []):
                          resource_type = propagation.get('ResourceType')
                          attachment_id = propagation.get('TransitGatewayAttachmentId')
                          state = propagation.get('State')
                          
                          if resource_type == 'connect' and state == 'enabled':
                              connect_attachments.append(attachment_id)
              except Exception as e:
                  print(f"Error getting connect attachments for route table {route_table_id}: {e}")
              
              return connect_attachments

          def export_tgw_routes_to_s3(ec2_client, route_table_id, s3_bucket):
              """Export Transit Gateway routes to S3 with active/propagated filters."""
              try:
                  filters = [
                      {'Name': 'state', 'Values': ['active']},
                      {'Name': 'type', 'Values': ['propagated']}
                  ]
                  
                  response = ec2_client.export_transit_gateway_routes(
                      TransitGatewayRouteTableId=route_table_id,
                      S3Bucket=s3_bucket,
                      Filters=filters
                  )
                  
                  return response.get('S3Location')
              except Exception as e:
                  print(f"Error exporting routes for route table {route_table_id}: {e}")
                  raise

          def download_json_from_s3(s3_client, s3_location):
              """Download and parse JSON file from S3 location."""
              try:
                  parsed = urlparse(s3_location)
                  
                  if parsed.netloc.startswith('s3'):
                      path_parts = parsed.path.lstrip('/').split('/', 1)
                      bucket = path_parts[0]
                      key = path_parts[1]
                  else:
                      bucket = parsed.netloc.split('.')[0]
                      key = parsed.path.lstrip('/')
                  
                  response = s3_client.get_object(Bucket=bucket, Key=key)
                  content = response['Body'].read().decode('utf-8')
                  
                  return json.loads(content), key
              except Exception as e:
                  print(f"Error downloading from S3 location {s3_location}: {e}")
                  raise

          # =============================================================================
          # CloudWatch Functions
          # =============================================================================
          def publish_metrics_batch(cloudwatch_client, metrics_data):
              """Publish multiple route count metrics to CloudWatch in batches."""
              BATCH_SIZE = 20
              metrics_published = 0
              
              try:
                  metric_data_list = []
                  
                  for metric in metrics_data:
                      dimensions = [
                          {'Name': 'ConnectAttachmentId', 'Value': metric['connect_attachment_id']},
                          {'Name': 'ConnectPeerId', 'Value': metric['connect_peer_id']}
                      ]
                      
                      metric_data_list.append({
                          'MetricName': METRIC_NAME,
                          'Dimensions': dimensions,
                          'Value': metric['route_count'],
                          'Unit': 'Count',
                          'Timestamp': datetime.utcnow()
                      })
                  
                  for i in range(0, len(metric_data_list), BATCH_SIZE):
                      batch = metric_data_list[i:i + BATCH_SIZE]
                      
                      cloudwatch_client.put_metric_data(
                          Namespace=CLOUDWATCH_NAMESPACE,
                          MetricData=batch
                      )
                      
                      metrics_published += len(batch)
              except Exception as e:
                  print(f"Error publishing metrics to CloudWatch: {e}")
              
              return metrics_published

          def collect_unique_peer_metrics(complete_mapping):
              """Collect metrics data for unique connect peers across all route tables."""
              peer_aggregation = {}
              
              for tgw_id, tgw_data in complete_mapping.items():
                  for rt_id, rt_data in tgw_data['route_tables'].items():
                      peers = rt_data.get('peers', {})
                      
                      for peer_id, peer_data in peers.items():
                          if peer_id not in peer_aggregation:
                              peer_aggregation[peer_id] = {
                                  'connect_attachment_id': peer_data['connect_attachment_id'],
                                  'routes': set()
                              }
                          
                          peer_aggregation[peer_id]['routes'].update(peer_data['routes'])
              
              metrics_data = []
              for peer_id, peer_data in peer_aggregation.items():
                  metrics_data.append({
                      'connect_attachment_id': peer_data['connect_attachment_id'],
                      'connect_peer_id': peer_id,
                      'route_count': len(peer_data['routes'])
                  })
              
              return metrics_data

          # =============================================================================
          # Parsing Functions
          # =============================================================================
          def parse_connect_routes_by_peer(routes_data, connect_attachment_ids=None):
              """Parse routes and create mapping grouped by Connect peer."""
              if isinstance(routes_data, dict):
                  routes = routes_data.get('routes', routes_data.get('Routes', []))
              else:
                  routes = routes_data
              
              peer_mapping = {}
              
              for route in routes:
                  destination_cidr = route.get('destinationCidrBlock', route.get('DestinationCidrBlock'))
                  attachments = route.get('transitGatewayAttachments', route.get('TransitGatewayAttachments', []))
                  
                  for attachment in attachments:
                      resource_type = attachment.get('resourceType', attachment.get('ResourceType'))
                      
                      if resource_type == 'connect':
                          attachment_id = attachment.get('transitGatewayAttachmentId', 
                                                         attachment.get('TransitGatewayAttachmentId'))
                          connect_peer_id = attachment.get('resourceId', attachment.get('ResourceId'))
                          
                          if connect_attachment_ids and attachment_id not in connect_attachment_ids:
                              continue
                          
                          if connect_peer_id not in peer_mapping:
                              peer_mapping[connect_peer_id] = {
                                  'connect_attachment_id': attachment_id,
                                  'routes': []
                              }
                          
                          if destination_cidr not in peer_mapping[connect_peer_id]['routes']:
                              peer_mapping[connect_peer_id]['routes'].append(destination_cidr)
              
              return peer_mapping

          # =============================================================================
          # S3 Cleanup Functions
          # =============================================================================
          def cleanup_s3_objects(s3_client, bucket_name, keys_to_delete):
              """Clean up specific exported route files from S3 bucket."""
              if not keys_to_delete:
                  return 0
              
              deleted_count = 0
              
              try:
                  delete_objects = [{'Key': key} for key in keys_to_delete]
                  
                  for i in range(0, len(delete_objects), 1000):
                      batch = delete_objects[i:i + 1000]
                      s3_client.delete_objects(
                          Bucket=bucket_name,
                          Delete={'Objects': batch}
                      )
                      deleted_count += len(batch)
                  
                  return deleted_count
                  
              except ClientError as e:
                  print(f"Error cleaning up S3 objects: {e}")
                  return deleted_count

          # =============================================================================
          # Main Discovery and Mapping Functions
          # =============================================================================
          def discover_tgw_structure(ec2_client):
              """Discover all Transit Gateways, their route tables, and Connect attachments."""
              print("Discovering Transit Gateways...")
              transit_gateways = get_all_transit_gateways(ec2_client)
              print(f"Found {len(transit_gateways)} Transit Gateway(s)")
              
              tgw_structure = {}
              
              for tgw_id in transit_gateways:
                  route_tables = get_tgw_route_tables(ec2_client, tgw_id)
                  print(f"TGW {tgw_id}: Found {len(route_tables)} route table(s)")
                  
                  tgw_structure[tgw_id] = {'route_tables': {}}
                  
                  for rt_id in route_tables:
                      tgw_structure[tgw_id]['route_tables'][rt_id] = {'connect_attachments': []}
              
              for tgw_id, tgw_data in tgw_structure.items():
                  for rt_id in tgw_data['route_tables']:
                      connect_attachments = get_connect_attachments_propagating_to_route_table(ec2_client, rt_id)
                      tgw_structure[tgw_id]['route_tables'][rt_id]['connect_attachments'] = connect_attachments
                      
                      if connect_attachments:
                          print(f"Route Table {rt_id}: {len(connect_attachments)} Connect attachment(s)")
              
              return tgw_structure

          def export_and_map_connect_routes(ec2_client, s3_client, s3_bucket, tgw_structure):
              """Export routes for each route table with Connect attachments and create peer mapping."""
              result = {}
              processed_attachments = set()
              exported_keys = []
              
              for tgw_id, tgw_data in tgw_structure.items():
                  result[tgw_id] = {'route_tables': {}}
                  
                  for rt_id, rt_data in tgw_data['route_tables'].items():
                      connect_attachments = rt_data['connect_attachments']
                      
                      result[tgw_id]['route_tables'][rt_id] = {
                          'connect_attachments': connect_attachments,
                          'peers': {}
                      }
                      
                      if not connect_attachments:
                          continue
                      
                      new_attachments = [att for att in connect_attachments if att not in processed_attachments]
                      
                      if not new_attachments:
                          continue
                      
                      try:
                          s3_location = export_tgw_routes_to_s3(ec2_client, rt_id, s3_bucket)
                          print(f"Exported routes for {rt_id}")
                          
                          routes_data, s3_key = download_json_from_s3(s3_client, s3_location)
                          exported_keys.append(s3_key)
                          
                          peer_mapping = parse_connect_routes_by_peer(routes_data, connect_attachments)
                          result[tgw_id]['route_tables'][rt_id]['peers'] = peer_mapping
                          
                          processed_attachments.update(connect_attachments)
                          
                          print(f"Found {len(peer_mapping)} Connect peer(s) for {rt_id}")
                          
                      except Exception as e:
                          print(f"Error processing route table {rt_id}: {str(e)}")
                          result[tgw_id]['route_tables'][rt_id]['error'] = str(e)
              
              return result, exported_keys

          def publish_metrics_to_cloudwatch(cloudwatch_client, complete_mapping):
              """Publish route count metrics to CloudWatch for unique Connect peers."""
              metrics_data = collect_unique_peer_metrics(complete_mapping)
              
              if not metrics_data:
                  print("No metrics to publish (no Connect peers found)")
                  return 0, []
              
              print(f"Publishing {len(metrics_data)} unique Connect peer metric(s)...")
              
              for metric in metrics_data:
                  print(f"  {metric['connect_peer_id']}: {metric['route_count']} routes")
              
              metrics_published = publish_metrics_batch(cloudwatch_client, metrics_data)
              print(f"Successfully published {metrics_published} metric(s) to CloudWatch")
              
              return metrics_published, metrics_data

          # =============================================================================
          # Lambda Handler
          # =============================================================================
          def lambda_handler(event, context):
              """Lambda function handler."""
              s3_bucket = os.environ.get('S3_BUCKET')
              publish_metrics = os.environ.get('PUBLISH_METRICS', 'true').lower() == 'true'
              cleanup_bucket = os.environ.get('CLEANUP_BUCKET', 'true').lower() == 'true'
              
              print(f"Configuration:")
              print(f"  S3 Bucket: {s3_bucket}")
              print(f"  Publish Metrics: {publish_metrics}")
              print(f"  Cleanup Bucket: {cleanup_bucket}")
              
              ec2_client = boto3.client('ec2')
              s3_client = boto3.client('s3')
              cloudwatch_client = boto3.client('cloudwatch')
              
              tgw_structure = discover_tgw_structure(ec2_client)
              
              complete_mapping, exported_keys = export_and_map_connect_routes(
                  ec2_client, s3_client, s3_bucket, tgw_structure
              )
              
              metrics_published = 0
              unique_peer_metrics = []
              if publish_metrics:
                  metrics_published, unique_peer_metrics = publish_metrics_to_cloudwatch(
                      cloudwatch_client, complete_mapping
                  )
              
              if cleanup_bucket and exported_keys:
                  deleted_count = cleanup_s3_objects(s3_client, s3_bucket, exported_keys)
                  print(f"Cleaned up {deleted_count} exported file(s) from S3")
              
              total_tgws = len(complete_mapping)
              total_peers = len(unique_peer_metrics)
              total_routes = sum(m['route_count'] for m in unique_peer_metrics)
              
              result = {
                  'statusCode': 200,
                  'body': {
                      'transit_gateways': total_tgws,
                      'connect_peers': total_peers,
                      'total_routes': total_routes,
                      'metrics_published': metrics_published,
                      'peer_metrics': unique_peer_metrics
                  }
              }
              
              print(f"Summary: {total_tgws} TGWs, {total_peers} peers, {total_routes} routes, {metrics_published} metrics published")
              
              return result
      Tags:
        - Key: Purpose
          Value: TGW-Connect-Route-Metrics

  # ---------------------------------------------------------------------------
  # CloudWatch Events Rule - Triggers every 5 minutes
  # ---------------------------------------------------------------------------
  ScheduledRule:
    Type: AWS::Events::Rule
    Properties:
      Name: TgwConnectRouteMetrics-Schedule
      Description: Triggers TGW Connect Route Metrics Lambda every 5 minutes
      ScheduleExpression: 'rate(5 minutes)'
      State: ENABLED
      Targets:
        - Id: TgwConnectRouteMetricsTarget
          Arn: !GetAtt TgwConnectRouteMetricsFunction.Arn

  # ---------------------------------------------------------------------------
  # Lambda Permission for CloudWatch Events
  # ---------------------------------------------------------------------------
  LambdaInvokePermission:
    Type: AWS::Lambda::Permission
    Properties:
      FunctionName: !Ref TgwConnectRouteMetricsFunction
      Action: lambda:InvokeFunction
      Principal: events.amazonaws.com
      SourceArn: !GetAtt ScheduledRule.Arn

  # ---------------------------------------------------------------------------
  # CloudWatch Log Group for Lambda
  # ---------------------------------------------------------------------------
  LambdaLogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: !Sub '/aws/lambda/${TgwConnectRouteMetricsFunction}'
      RetentionInDays: 14

# =============================================================================
# Outputs
# =============================================================================
Outputs:
  LambdaFunctionArn:
    Description: ARN of the Lambda function
    Value: !GetAtt TgwConnectRouteMetricsFunction.Arn
    Export:
      Name: !Sub '${AWS::StackName}-LambdaFunctionArn'

  LambdaFunctionName:
    Description: Name of the Lambda function
    Value: !Ref TgwConnectRouteMetricsFunction
    Export:
      Name: !Sub '${AWS::StackName}-LambdaFunctionName'

  S3BucketName:
    Description: Name of the S3 bucket for route exports
    Value: !Ref RouteExportBucket
    Export:
      Name: !Sub '${AWS::StackName}-S3BucketName'

  S3BucketArn:
    Description: ARN of the S3 bucket for route exports
    Value: !GetAtt RouteExportBucket.Arn
    Export:
      Name: !Sub '${AWS::StackName}-S3BucketArn'

  IAMRoleArn:
    Description: ARN of the Lambda execution role
    Value: !GetAtt LambdaExecutionRole.Arn
    Export:
      Name: !Sub '${AWS::StackName}-IAMRoleArn'

  CloudWatchNamespace:
    Description: CloudWatch namespace for metrics
    Value: TgwConnectPropagatedRouteCount

  CloudWatchMetricName:
    Description: CloudWatch metric name
    Value: PropagatedRouteCount

  ScheduleExpression:
    Description: Schedule expression for Lambda trigger
    Value: 'rate(5 minutes)'

  ScheduleRuleArn:
    Description: ARN of the CloudWatch Events schedule rule
    Value: !GetAtt ScheduledRule.Arn
    Export:
      Name: !Sub '${AWS::StackName}-ScheduleRuleArn'
  1. Navigate to the AWS CloudFormation console in your desired region

  2. Create a new stack by uploading the template and providing a stack name (e.g., "TGW-Connect-Prefix-Monitor")

  3. Configure parameters:

    • PublishMetrics: Set to 'true' to enable CloudWatch metric publishing (default: true)
    • CleanupBucket: Set to 'true' to automatically delete exported route files (default: true)
    • LambdaTimeout: Adjust timeout based on environment size (default: 300 seconds)
    • LambdaMemorySize: Select memory allocation (default: 256 MB)
  4. Review and acknowledge IAM resource creation, then create the stack

  5. Wait for stack creation to complete (typically 2-3 minutes). The Lambda function will begin executing every 5 minutes automatically

Understanding the Results

After deployment, metrics will appear in CloudWatch under a custom namespace TgwConnectPropagatedRouteCount

Enter image description here

Metric Structure:

  1. Namespace: TgwConnectPropagatedRouteCount
  2. Metric Name: PropagatedRouteCount
  3. Dimensions:
    • ConnectAttachmentId: The Transit Gateway Connect attachment ID
    • ConnectPeerId: The specific Connect peer ID advertising routes

Interpreting the Metrics:

Each data point represents the total number of unique prefixes advertised by a Connect peer at that time. For example:

  • A value of 250 indicates the Connect peer is advertising 250 routes
  • A value approaching 1,000 suggests you're nearing the default service quota
  • Sudden increases may indicate route leaks or misconfigurations
  • Decreasing values could signal BGP session issues or intentional route filtering

Example Scenario:

In this test scenario there two Connect attachments (tgw-attach-0b4b4e257b5fa2d37 & tgw-attach-07d04499b4eac85d1) with one and two Connect peers respectively:

  • tgw-attach-0b4b4e257b5fa2d37 with connect peer tgw-connect-peer-0f2e07b3eb2bd7bb1 advertising 10 prefixes
  • tgw-attach-07d04499b4eac85d1 with connect peers tgw-connect-peer-028db0d2b9d51b78b & tgw-connect-peer-0f537dac1b40dcefb each advertising 5 prefixes

There are three separate metrics, each tracking its respective Connect peer's prefix count over time.

Enter image description here

Integration with Other Solutions

CloudWatch Alarms for Threshold Monitoring: Create alarms to notify you when prefix counts exceed specified thresholds.

CloudWatch Dashboards: Build custom dashboards combining:

  • Prefix count metrics for all Connect peers
  • BGP session state metrics (from the BGP monitoring solution)

Multi-Account Monitoring: Deploy the solution in a centralized monitoring account:

  • Use cross-account IAM roles to monitor Transit Gateway connect attachments in multiple accounts
  • Aggregate metrics in a single CloudWatch namespace
  • Create organization-wide dashboards and alarms

Cleanup

To remove all resources created by this solution:

  • Navigate to the AWS CloudFormation console
  • Select the stack you created (e.g., "TGW-Connect-Prefix-Monitor")
  • Choose Delete and confirm the deletion
  • CloudFormation will automatically remove all resources including the Lambda function, S3 bucket (and its contents), IAM role, EventBridge rule, and CloudWatch log group

Note: CloudWatch metrics and alarms you created manually will not be deleted automatically.

Considerations

Pricing:

This solution incurs costs across multiple AWS services:

  • AWS Lambda: Charges based on execution time and memory.
  • Amazon S3: Minimal storage costs due to 1-day lifecycle policy.
  • CloudWatch Metrics: Custom metrics are charged at $0.30 per metric per month. Cost scales with the number of Connect peers (e.g., 10 peers = $3.00/month)
  • CloudWatch Alarms: $0.10 per alarm per month (if configured)
  • Data Transfer: Route export to S3 incurs minimal data transfer charges

Multi-Region Deployment:

To monitor Transit Gateway Connect peers across all regions:

  • Deploy per region: CloudFormation stacks must be deployed in each region where Transit Gateways exist, as the solution monitors resources regionally
  • Centralized monitoring: Configure CloudWatch cross-region dashboards to aggregate metrics from all regions in a single view
  • Automation: Use AWS CloudFormation StackSets to deploy the solution across multiple regions simultaneously
  • Naming conventions: Use consistent stack names with region suffixes (e.g., "TGW-Connect-Monitor-us-east-1") for easier management
  • Cost consideration: Multiply per-region costs by the number of deployed regions

Scaling Considerations:

  • The Lambda timeout may need adjustment for environments with many Transit Gateways or Connect peers
  • Consider increasing Lambda memory for faster execution in large environments
  • The solution handles pagination automatically for environments with numerous resources

Related Resources