Skip to content

Flex Service Tier for Bedrock models

1

It would be great to have all models in Bedrock (including Anthropic) work with the OpenAI service tiers (Priority, Standard, Flex).

I am more interested in the flex tiers for analytics. I currently run in batches for faster processing at the same price, but with the downsides that the flex tier has: https://developers.openai.com/api/docs/guides/flex-processing

Especially because we use tools.

2 Answers
6
Accepted Answer

To build on the previous response and provide more technical clarity for this specific use case:

  • OpenAI on Bedrock: While it's true that OpenAI models (like the gpt-oss series) are now appearing in Bedrock, it’s important to distinguish them from the standard GPT-4o models found elsewhere. These specific versions on Bedrock are designed to support the Flex and Priority tiers, making them a viable option if you are looking for that specific OpenAI architecture within the AWS ecosystem.

  • The Anthropic Factor: Since you specifically mentioned Anthropic, please note that the "Flex" tier rollout varies by provider. For Claude models, the most reliable way to achieve the "50% discount" for analytics—similar to the OpenAI Flex/Batch experience—is currently through Amazon Bedrock Batch Inference.

  • Batch vs. Flex for Tool Use: Since your workflow involves tools (function calling):

    • Flex Tier is great for low-priority live inference (results in minutes).

    • Batch Inference is often better for heavy analytics, as it processes large datasets asynchronously (up to 24 hours) with significant cost savings.

    • Note: Always verify that your specific model version (e.g., Claude 3.5 Sonnet) supports Tool Use within the Batch API in your selected region, as some asynchronous tiers have stricter schema requirements for tools.

  • Service Tier Selection: In the Bedrock Runtime API, you can indeed toggle between standard and flex using the service_tier parameter, but ensure your IAM permissions and Service Quotas are updated to include these new inference tiers, as they are managed separately from standard on-demand quotas.

EXPERT

answered a month ago

AWS
EXPERT

reviewed a month ago

  • I'm currently waiting for a quota reset (currently I have 0 quota for all models for some reason) does this mean we can already use the flex service tier with Claude Haiku 4.5 and Sonnet 4.6?

  • My understanding is, if your quota is showing as 0, it’s usually because the new 'Inference Profiles' require their own separate limits. Check the AWS Service Quotas dashboard specifically for Inference Profiles in your region and request an increase there.

0

Amazon Bedrock's Flex service tier is designed for non-time-critical workloads that can accept extended completion times in exchange for significant cost savings. The Flex tier offers approximately 50% discount on token processing compared to the Standard tier, making it ideal for scenarios where cost optimization is the priority over speed.

The Flex tier is well-suited for workloads such as model evaluations, content summarization, labeling and annotation, and multistep agentic workflows. Unlike other tiers that guarantee immediate availability, the Flex tier operates with lower queue priority, resulting in processing times measured in minutes rather than seconds, with a one-hour timeout limit. During periods of high demand, Flex requests receive lower priority relative to the Standard tier.

To use the Flex tier, you can set the "service_tier" optional parameter to "flex" for your inference request through the Amazon Bedrock runtime API. The tier operates on a discounted pay-as-you-go pricing model—you pay less per token but accept longer processing times and potential delays, with no prior reservation required.

Currently, the Flex tier is available for a range of foundation models including OpenAI (gpt-oss-20b, gpt-oss-120b), DeepSeek (DeepSeek V3.1), Qwen3 models, and Amazon Nova (Nova Pro and Nova Premier). However, model support varies, so it's recommended to check the documentation for the most current list of supported models. The Flex tier also supports explicit prompt caching, with the 90% discount applied to the already reduced Flex tier rates for supported models.
Sources
Service tiers for optimizing performance and cost - Amazon Bedrock
On-Demand Tiers – Amazon Bedrock – AWS
Balance cost, performance & reliability for AI at enterprise scale through Bedrock Inference Tiers | AWS re:Post
Amazon Bedrock introduces Priority and Flex inference service tiers - AWS
New Amazon Bedrock service tiers help you match AI workload performance with cost | AWS News Blog

answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.