- Newest
- Most votes
- Most comments
Hello,
You may want to first check the AWS Health Dashboard / Service Health page for any reported Amazon Bedrock event in ap-northeast-1 during that timeframe.
Also, repost is not an official AWS Support channel, so I’m replying here as a member of the AWS community, not as AWS Support.
On your questions: Was this a service-level incident? I can't tell for sure, you may need to raise a support ticket. I see no outage on that region on the Health Dashboard.
Does “Too many connections” point to account concurrency or regional capacity? It can be consistent with either request/concurrency limits on your side or temporary service/model-side capacity pressure. Since you were using InvokeModelWithResponseStream, long-lived streaming connections can also increase pressure versus short non-streaming calls.
Are cross-region inference quotas different? Amazon Bedrock documents that cross-Region inference uses inference profiles and is intended to help absorb unplanned traffic bursts by routing across supported Regions. Bedrock also has service quotas for model usage and API operations, but the exact effective limits can vary by model, Region, and account. For the current values on your account, review the Service Quotas service.
Would Provisioned Throughput help? Potentially yes. AWS positions Provisioned Throughput as the option to reserve model capacity and reduce exposure to shared on-demand capacity fluctuations. If this workload is production-critical and the traffic pattern is sustained or predictable, that is worth evaluating. Otherwise, a quota review / increase request plus retry-and-backoff tuning is usually the first step.
What should you do next? Check AWS Health for that exact window Review Bedrock quotas in your account Open an AWS Support case with request IDs, timestamps, Region, model ID, and traffic volume/concurrency details Confirm whether the jp.* model access path you are using is via an inference profile / cross-Region route and whether that changes your effective throughput behavior Add client-side exponential backoff / jitter and consider connection pooling limits for streaming requests
If this backs a production service, I would treat it as something worth planning around rather than assuming it was only a one-off event.
Relevant content
- AWS OFFICIALUpdated a year ago
