Getting inference error with Flan-T5 XXL BNB INT8 model. "Input payload must include text input key"

0

I am following this blog to do RAG with Kendra, LLM, Langchain, and sagemaker Jumpstart. https://aws.amazon.com/blogs/machine-learning/quickly-build-high-accuracy-generative-ai-applications-on-enterprise-data-using-amazon-kendra-langchain-and-large-language-models/

I tried Flan-T5 XL model. Got it working but performance is not as expected. So the end to end setup is working. When I tried using Flan-T5 XXL BNB INT8. I got this error:

ValueError: Error raised by inference endpoint: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{ "code": 400, "type": "InternalServerException", "message": "Input payload must contain text_inputs key." } ". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/jumpstart-dft-hf-text2text-flan-t5-xxl-bnb-int8 in account xxxx for more information.

I am using the kendra_retriever py script offered from the git (https://aws.amazon.com/blogs/machine-learning/quickly-build-high-accuracy-generative-ai-applications-on-enterprise-data-using-amazon-kendra-langchain-and-large-language-models/). I suspect there is something I need to do to convert the input into 8bit ??

Please kindly offer some pointers and help. thank you.

  • Hi Clara, can you please show me how your llm and contenthandler is defined? The issue is with the code there

2 Answers
0

Can you try this PR : https://github.com/aws-samples/amazon-kendra-langchain-extensions/pull/6/files

Or do the changes manually : samples/kendra_chat_flan_xxl.py Change this :

input_str = json.dumps({"inputs": prompt, "parameters": model_kwargs})

With this :

input_str = json.dumps({"text_inputs": prompt, **model_kwargs})

Change this :

return response_json[0]["generated_text"]

with this :

return response_json["generated_texts"][0]

Do the other changes for the file samples/kendra_retriever_flan_xxl.py

AWS
answered 10 months ago
  • thank you Wael_AWS for the suggestions. I did those two exact changes and still ran into the same error. Then, I started to examine the environment variables used in the scripts.

    Re-issuing all of them (including AWS_REGION, KENDRA_INDEX_ID, FLAN_XXL_ENDPOINT) seem to solve the issue, and restarting streamlit manually seem to resolve the issue. We had the env variables and streamlit startup in a script before. We are trying to understand the cause of issue.

    Thank you for your suggestion!

0

Please change text "inputs" in this line [https://github.com/aws-samples/amazon-kendra-langchain-extensions/blob/main/kendra_retriever_samples/kendra_chat_flan_xxl.py#L33] to "text_inputs". ContentHandler class in the code will be different for different LLM and their variations. Please refer to the input and output expectation from model provider and change ContentHandler accordingly.

profile pictureAWS
EXPERT
answered 10 months ago
  • thank you AWS-User-Nitin. I made the changes as suggested. Was still getting the same error. However with some env variables reset, that seems to solve the issue. I was able to run inference with XXL now. thank you for your suggestion.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions