Skip to content

Bedrock Llama 3 70B: Why is that the maximum generation length is 2048? While llama 3 has more room for output tokens

0

I am curious to understand why is Llama 3 70B restricted to only 2048 output token length?

Is there a way to increase the limit for me? Also, I do get an exception of number of calls I make sometimes, is there a way to move to production with this?

Please help.

1 Answer
0

Documentation up front: https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html https://github.com/facebookresearch/llama/blob/main/llama/model.py

The 2048 max token output is limited by the Meta Llama 3 model itself as identified in the second link, which is the actual model code and while I cannot confirm for sure, it appears the current bedrock Llama model has accepted the default value.

Given that this is a part of the model deployment, it will not be a service quota that can be increased. You can submit a support ticket to request the service team increase this limit in the model deployment for Bedrock Llama 3.

Another option would be to use Sagemaker Jumpstart or other AWS services to stand up your own version of the Llama 3 model. Given it is open source, you would be able to change the output token to a length that suits your needs.

AWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.