How do I optimize batch inference jobs in Amazon Bedrock?

1 minute read
3

My Amazon Bedrock batch inference jobs are slow or fail.

Resolution

Batch job execution times depend on available capacity, concurrent jobs in the queue, and model-specific resource allocation. Use the following resolution methods to optimize your batch inference jobs in Amazon Bedrock.

Provide simple and complete input prompts

To reduce the time to process the job and improve the quality of the results, create clear, concise prompts that don't include unnecessary context.

Don't exceed service quotas

If you run multiple batch inference jobs in parallel, then make sure that they don't exceed service quotas that vary by model and AWS Region.

For more information about large-scale projects, see Automate Amazon Bedrock batch inference: Building a scalable and efficient pipeline.

Schedule jobs to run off-peak

Use Amazon EventBridge to schedule batch inference jobs during off-peak hours when resource availability might be higher.

Use cross-Region inference

Use cross-Region inference profiles in CreateModelInvocationJob API requests to distribute workloads across Regions.