I am getting read timeout errors when I use large language models (LLMs) in Amazon Bedrock to generate text.
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
When you use large language models (LLMs) in Amazon Bedrock to generate text, you might get read timeout errors. The timeout errors occur when the AWS SDK for Python (Boto3) client queries the LLM but doesn't receive a response within botocore's default read timeout period. To resolve read timeout errors, increase the read timeout or use streaming APIs.
Increase the read timeout
It's a best practice to set the read_timeout value to be long enough to allow your queries to complete. Start with a large value such as 3,600 seconds, and then adjust this duration until you no longer get a timeout error. To increase the read_timeout value, run code similar to the following example code. For more information, see the read_timeout parameter in botocore.config.
from boto3 import client
from botocore.config import Config
config = Config(read_timeout=1000)
client = client(service_name='bedrock-runtime',
config=config)
Note: Replace 1000 with your timeout value.
If you use third-party libraries, first instantiate an SDK for Python (Boto3) client with a botocore configuration. Then, pass this configuration as a client parameter to a callable model class.
To increase the read timeout value when you pass a Boto3 client to a third-party library, run code similar to the following example:
from boto3 import client
from botocore.config import Config
from langchain_aws import ChatBedrock
config = Config(read_timeout=1000)
client = client(service_name='bedrock-runtime',
config=config)
llm = ChatBedrock(model_id="anthropic.claude-3-5-sonnet-20240620-v1:0",
client=client)
The preceding examples show that the read timeout is set to 1,000 seconds. The read timeout period specifies how long botocore waits for a response from the server before it returns a read timeout exception.
Note: LLMs such as Anthropic Claude 3.7 Sonnet can take more than 60 seconds to return a response. For Anthropic Claude 3.7 Sonnet, it's a best practice to set a timeout value of at least 3,600 seconds.
Use ConverseStream to stream responses
If you work with long responses or provide partial results to users, then use the ConverseStream API operation to receive generated tokens. The ConverseStream API operation returns tokens as they're generated, which helps avoid timeouts on long responses.
To use the ConverseStream API operation, run a code that's similar to the following example:
import json
from boto3 import client
from botocore.config import Config
# Configure the client
config = Config()
client = client(service_name='bedrock-runtime', config=config)
# Create request parameters
request = {
"modelId": "anthropic.claude-3-5-sonnet-20240620-v1:0",
"messages": [
{
"role": "user",
"content": [
{
"text": "Could you write very long story?"
}
]
}
]
}
# Call the streaming API
response = client.converse_stream(
modelId=request["modelId"],
messages=request["messages"]
)
# Process the streaming response
for event in response['stream']:
if 'contentBlockDelta' in event:
print(event['contentBlockDelta']['delta']['text'], end='')
Note: Replace modelID with your model ID and input text with your text.