Skip to content

Is QAAccuracy in fmeval able to use the context data in a dataset, or is it meant to be ignored?

0

Based on the Question Answering section in the SageMaker Clarify documentation documentation, I don't see any contextual data. For example:

  • Question: “where is the world's largest ice sheet located today?”
  • Ground truth: “Antarctica”
  • Generated answer: “in South America”
    • Score: 0
  • Generated answer: “in Antarctica”
    • Score: 1

However, when looking at a suggested dataset, such as BoolQ, the train.jsonl file has the following format:

{"question": "...", "title": "...", "answer": true, "passage": "..."}
{"question": "...", "title": "...", "answer": false, "passage": "..."}

Here the passage field provides the context for the answer. Is fmeval's QAAccuracy able to handle the context, or is the context meant to be ignored (the documentation has no mention of context)?

1 Answer
0
Accepted Answer

As I understand from this open GitHub issue, multi-variable prompt templates are still not yet supported today.

In this Streamlit demo app for prompt engineering/optimization (which accepts arbitrary input QA datasets, so long as they have one reference answers column) - we worked around this by doing the prompt templating ourselves as a pre-processing step before FMEval. You can find screenshots and a walkthrough of the demo app's user experience here.

I believe the fmeval team are aware that both multi-variable prompt templating in general and in-context/RAG-based question answering in particular are important to customers, so these features might get expanded in future - but adding your voice on GitHub/etc may help drive their prioritization.

AWS
EXPERT
answered 2 years ago
EXPERT
reviewed 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.