Meta Llama3 8B results are different than expected when triggered from lambda vs Bedrock Playground and other model execution testing

0

Hey AWS folks! I'm testing out one of the Bedrock foundation models. Specifically, I'm trying to trigger the Meta Llama 3 8B model from a lambda function. Throughout my personal testing with this model on the llama2.ai website (now runs Llama 3 8B by default) and in the Bedrock Playground, I've gotten pretty consistent and exciting results. So, I assumed it would probably be the winner when I baked an auto-regressive LLM into my business use case. When I went to test this model out from a lambda function with identical model parameters, I get significantly different results. When run from the lambda function, the results often include superfluous text that's unwanted from both technical and costs standpoints. Below are my model parameters (excluding the prompt, which you'll have to trust me is the same in both cases :) ):

max_gen_len = 512 temperature = 0.75 top_p = 0.9

I understand that, given the nature of LLM's and the temperature set, there will be variability from one model execution to the next. However, the superfluous result content (examples: long sequences of unnecessary pipe characters, lengthy prose before and after the result I actually want, and otherwise content that doesn't really make sense with the prompt) is a real problem and variability beyond what I'm expecting given my testing with the model thus far in the Bedrock Playground and outside of AWS.

Any ideas as to what might be happening here?

已提问 3 个月前375 查看次数
3 回答
5

Hello,

It seems the issue might be due to differences in hidden system prompts or configurations between Lambda and the Bedrock Playground. Double-check that both environments are using identical prompt setups, including any system prompts or hidden parameters that might be influencing the output. For detailed guidance on how system prompts can impact results, you can refer to this article: https://repost.aws/articles/AR-LV1HoR_S0m-qy89wXwHmw/the-leverage-of-llm-system-prompt-by-knowledge-bases-for-bedrock-in-rag-workflows

profile picture
专家
已回答 3 个月前
专家
已审核 3 个月前
4

Hi,

Are you sure that you prompt exactly the same way from Lambda and meta website ? Are you sure for example that Meta doesn't include a system prompt that you don't see but that provide guidance to the LLM.

See my article to measure how such a system prompt via very deep guidance can impact results: https://repost.aws/articles/AR-LV1HoR_S0m-qy89wXwHmw/the-leverage-of-llm-system-prompt-by-knowledge-bases-for-bedrock-in-rag-workflows

Best,

Didier

profile pictureAWS
专家
已回答 3 个月前
profile picture
专家
已审核 3 个月前
profile picture
专家
已审核 3 个月前
1
已接受的回答

Yes, I'm sure my prompt is the same across the places where I'm executing the model. I figured it out. Hopefully, this helps the next guy who runs into this. Simply put, for this model to work properly executed from a Lambda function, the prompt needs to be nested inside of some formatting text as in the Python example below. Without doing this, the model can produce erratic results. A full code example can be found here (https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-runtime_example_bedrock-runtime_InvokeModel_MetaLlama3_section.html).

AWS folks - It's worth noting that part of my confusion here stemmed from the fact that the Bedrock documentation in the AWS console has an "API Request" section at the bottom of each foundation model. In the Meta 3 8B case, at least, that section was sort of misleading. That is, if you want to run the model successfully, you need more than the set of parameters listed in Bedrock for the FM.

Parting thoughts: I'm guessing that both the AWS Playground and the website I linked programmatically format user prompts as below. That would explain the discrepancy.

# Embed the prompt in Llama 3's instruction format.
formatted_prompt = f"""
<|begin_of_text|>
<|start_header_id|>user<|end_header_id|>
{prompt}
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""
已回答 3 个月前
  • I'm having the same situation that you had, but even after updating the instruction format I see major differences between the results provided by bedrock and the same model but deployed to a sagemaker endpoint(also same prompt and same inference parameters) via jumpstart, did you find any other major differences? or just by changing the prompt format you solved your issue?

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则