Skip to content

How to Track and Limit Amazon Bedrock Usage by User

7 minute read
Content level: Foundational
0

This article provides a practical guide to implementing user-level cost tracking and usage limiting for Amazon Bedrock, helping you maintain visibility and control as your AI applications grow.

Introduction

Learn how to implement user-level cost tracking and usage limiting for Amazon Bedrock using Application Inference Profiles, CloudWatch monitoring, and custom enforcement mechanisms to maintain visibility and control over your generative AI spending.

Environment

  • AWS Services: Amazon Bedrock, AWS Cost Explorer, Amazon CloudWatch, AWS Budgets, AWS Lambda, Amazon DynamoDB, AWS Step Functions
  • Prerequisites:
    • Active AWS account with Amazon Bedrock access
    • IAM permissions for Bedrock, CloudWatch, and Cost Management
    • Basic understanding of AWS tagging and cost allocation

The Challenge

Standard AWS billing shows Bedrock costs aggregated by model and region but doesn't provide granular attribution to individual IAM users, application tenants, business units, or projects. Without this visibility, you cannot allocate costs accurately, identify high-usage users, enforce budget limits, or optimize spending based on usage patterns.


Resolution

Part 1: Tracking Usage by User

Primary Method: Application Inference Profiles (Recommended)

Application Inference Profiles (AIPs) are the production-ready solution for granular cost tracking. An AIP is a logical wrapper around a Bedrock model that allows you to apply custom cost allocation tags.

Implementation steps:

  1. Create an AIP for each user, team, or tenant via the Bedrock console or API
  2. Apply custom tags (e.g., user:alice, team:engineering, tenant:acme-corp)
  3. Use the AIP ARN in your inference API calls instead of the base model ARN
  4. Activate tags in the Billing and Cost Management console
  5. View costs in AWS Cost Explorer filtered by your custom tags

Example API call:

response = bedrock_runtime.converse(
    modelId="arn:aws:bedrock:us-east-1:123456789012:inference-profile/user-alice-profile",
    messages=[{"role": "user", "content": [{"text": "Hello"}]}]
)

Benefits:

  • Tags appear automatically in AWS Cost Explorer and Cost & Usage Reports
  • No application logic changes beyond using the AIP ARN
  • Works with all Bedrock models and supports cross-region inference
  • Native integration with AWS cost management tools

Alternative Method: CloudWatch Logs Insights

Enable Bedrock model invocation logging to CloudWatch Logs. Each log entry includes the identity.arn field for querying usage by IAM principal.

Implementation steps:

  1. Enable model invocation logging in the Bedrock console
  2. Query token usage using CloudWatch Logs Insights
  3. Create dashboards for visualization

Query example:

fields @timestamp, identity.arn, inputTokenCount, outputTokenCount
| filter identity.arn like /user\/alice/
| stats sum(inputTokenCount) as totalInput, 
        sum(outputTokenCount) as totalOutput by identity.arn

When to use:

  • Need detailed request-level analysis for debugging
  • Want to attribute costs without creating AIPs
  • Require audit trails for compliance

Limitations:

  • Not integrated with AWS Cost Explorer
  • Requires custom queries and dashboards
  • Additional CloudWatch Logs costs apply

Supplementary Method: Converse API Request Metadata

For applications using the Converse API, include custom metadata with each request:

response = bedrock_runtime.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=[{"role": "user", "content": [{"text": "Hello"}]}],
    requestMetadata={"userId": "alice", "tenantId": "acme-corp"}
)

This metadata appears in CloudWatch Logs and enables custom cost attribution logic when combined with log analysis.


Part 2: Limiting Usage and Controlling Costs

AWS does not provide native IAM policies to limit Bedrock API calls or token consumption. Implement controls using these mechanisms:

MethodEnforcement SpeedComplexityBest For
CloudWatch Alarms + LambdaNear real-time (minutes)MediumAutomated responses to usage spikes
AWS BudgetsDaily updatesLowCost-based alerts and notifications
Application-Layer GatewayReal-time (milliseconds)HighStrict quotas and rate limiting

Method 1: CloudWatch Alarms with Automated Response

Implementation steps:

  1. Create CloudWatch metrics for token usage per AIP
  2. Set alarms at threshold values (e.g., 1M tokens/day)
  3. Configure alarm actions to trigger Lambda via SNS
  4. Lambda function updates IAM policies or sends notifications

Architecture: CloudWatch Alarm → SNS → Lambda → IAM policy update or access control

Best for: Automated enforcement with acceptable latency (5-15 minutes)


Method 2: AWS Budgets with Tag-Based Tracking

Implementation steps:

  1. Create budgets for each AIP using cost allocation tags
  2. Configure alerts at 80%, 90%, and 100% of budget
  3. Set up notifications to SNS or email
  4. Optionally trigger Lambda for automated actions

Best for: Cost-based monitoring with daily granularity and forecasting

Limitation: Budget data updates daily, not suitable for real-time enforcement


Method 3: Custom Application-Layer Gateway

Implement a "gatekeeper" pattern for strict, real-time enforcement:

Implementation steps:

  1. Route all Bedrock requests through Lambda or API Gateway
  2. Track usage in DynamoDB (user ID → token count)
  3. Check limits before forwarding requests to Bedrock
  4. Return error if limit exceeded

Best for: Multi-tenant SaaS applications requiring precise, real-time quotas

Considerations: Adds latency (~50-200ms) and requires additional infrastructure


Decision Framework

Use Application Inference Profiles when:

  • You need accurate cost allocation in AWS Cost Explorer
  • You have multiple teams, projects, or tenants
  • You want minimal code changes

Use CloudWatch Logs Insights when:

  • You need detailed request-level analysis
  • You're debugging usage patterns or investigating issues
  • You want audit trails without creating AIPs

Use CloudWatch Alarms when:

  • You need near real-time enforcement
  • You want automated responses to usage anomalies

Use Application-Layer Gateway when:

  • You need strict, real-time usage limits
  • You're building a multi-tenant SaaS application
  • You require complex quota logic (daily/weekly/monthly limits)

Combine multiple methods when:

  • Managing large-scale, multi-tenant environments
  • Requiring defense-in-depth cost controls
  • Need both accurate allocation and real-time enforcement

Best Practices

  1. Start with Application Inference Profiles – They provide the best balance of accuracy and ease of implementation
  2. Enable model invocation logging early – Historical data is valuable for analysis and optimization
  3. Implement monitoring before enforcement – Understand usage patterns before setting limits
  4. Tag consistently – Establish and document tagging standards across your organization

Troubleshooting

Issue: Tags not appearing in Cost Explorer

  • Solution: Ensure tags are activated in Billing and Cost Management console. Allow 24 hours for tags to appear in reports.

Issue: CloudWatch Logs queries returning no results

  • Solution: Verify model invocation logging is enabled and logs are being delivered to CloudWatch. Check IAM permissions for log delivery.

Issue: Application-layer gateway adding too much latency

  • Solution: Optimize DynamoDB queries with proper indexing. Consider caching quota checks for short periods (30-60 seconds).

Issue: Budget alerts not triggering

  • Solution: Verify budget thresholds are set correctly and SNS topic has valid subscriptions. Check that cost allocation tags match your AIP tags.

Related Resources

AWS Documentation:

AWS Blogs:

AWS Workshops:

AWS Solutions: