Meta Llama3 8B results are different than expected when triggered from lambda vs Bedrock Playground and other model execution testing

0

Hey AWS folks! I'm testing out one of the Bedrock foundation models. Specifically, I'm trying to trigger the Meta Llama 3 8B model from a lambda function. Throughout my personal testing with this model on the llama2.ai website (now runs Llama 3 8B by default) and in the Bedrock Playground, I've gotten pretty consistent and exciting results. So, I assumed it would probably be the winner when I baked an auto-regressive LLM into my business use case. When I went to test this model out from a lambda function with identical model parameters, I get significantly different results. When run from the lambda function, the results often include superfluous text that's unwanted from both technical and costs standpoints. Below are my model parameters (excluding the prompt, which you'll have to trust me is the same in both cases :) ):

max_gen_len = 512 temperature = 0.75 top_p = 0.9

I understand that, given the nature of LLM's and the temperature set, there will be variability from one model execution to the next. However, the superfluous result content (examples: long sequences of unnecessary pipe characters, lengthy prose before and after the result I actually want, and otherwise content that doesn't really make sense with the prompt) is a real problem and variability beyond what I'm expecting given my testing with the model thus far in the Bedrock Playground and outside of AWS.

Any ideas as to what might be happening here?

質問済み 3ヶ月前385ビュー
3回答
5

Hello,

It seems the issue might be due to differences in hidden system prompts or configurations between Lambda and the Bedrock Playground. Double-check that both environments are using identical prompt setups, including any system prompts or hidden parameters that might be influencing the output. For detailed guidance on how system prompts can impact results, you can refer to this article: https://repost.aws/articles/AR-LV1HoR_S0m-qy89wXwHmw/the-leverage-of-llm-system-prompt-by-knowledge-bases-for-bedrock-in-rag-workflows

profile picture
エキスパート
回答済み 3ヶ月前
エキスパート
レビュー済み 3ヶ月前
4

Hi,

Are you sure that you prompt exactly the same way from Lambda and meta website ? Are you sure for example that Meta doesn't include a system prompt that you don't see but that provide guidance to the LLM.

See my article to measure how such a system prompt via very deep guidance can impact results: https://repost.aws/articles/AR-LV1HoR_S0m-qy89wXwHmw/the-leverage-of-llm-system-prompt-by-knowledge-bases-for-bedrock-in-rag-workflows

Best,

Didier

profile pictureAWS
エキスパート
回答済み 3ヶ月前
profile picture
エキスパート
レビュー済み 3ヶ月前
profile picture
エキスパート
レビュー済み 3ヶ月前
1
承認された回答

Yes, I'm sure my prompt is the same across the places where I'm executing the model. I figured it out. Hopefully, this helps the next guy who runs into this. Simply put, for this model to work properly executed from a Lambda function, the prompt needs to be nested inside of some formatting text as in the Python example below. Without doing this, the model can produce erratic results. A full code example can be found here (https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-runtime_example_bedrock-runtime_InvokeModel_MetaLlama3_section.html).

AWS folks - It's worth noting that part of my confusion here stemmed from the fact that the Bedrock documentation in the AWS console has an "API Request" section at the bottom of each foundation model. In the Meta 3 8B case, at least, that section was sort of misleading. That is, if you want to run the model successfully, you need more than the set of parameters listed in Bedrock for the FM.

Parting thoughts: I'm guessing that both the AWS Playground and the website I linked programmatically format user prompts as below. That would explain the discrepancy.

# Embed the prompt in Llama 3's instruction format.
formatted_prompt = f"""
<|begin_of_text|>
<|start_header_id|>user<|end_header_id|>
{prompt}
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""
回答済み 3ヶ月前
  • I'm having the same situation that you had, but even after updating the instruction format I see major differences between the results provided by bedrock and the same model but deployed to a sagemaker endpoint(also same prompt and same inference parameters) via jumpstart, did you find any other major differences? or just by changing the prompt format you solved your issue?

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ