I am experiencing an issue with metadata filtering in Amazon Bedrock. I set up a new collection and index within Amazon OpenSearch Serverless and created a knowledge base in Amazon Bedrock. The collection is the type "vectorsearch" and the index is using the "faiss" engine. The knowledge base uses a fixed-chunking strategy with chunk-size 1024 and is configured to use the previously mentioned OpenSearch collection.
I uploaded sample PDF files and their corresponding metadata files to an S3 bucket. The metadata files have the following format:
{
"metadataAttributes": {
"caseId": "2"
}
}
I followed the documentation to configure and sync the data source to the knowledge base. However, in the "Test knowledge base" module of the Bedrock console, when I try to filter documents by metadata attribute (e.g., caseId = "2"
), I receive the following error:
"failed to create query: Rewrite first".
If I remove the quotations around "2", the previous error does not appear, but the chatbot responds with
"Sorry, I am unable to assist you with this request."
I have ensured that:
- The metadata files are correctly named and formatted.
- The metadata files are in the same folder as their corresponding PDF files in S3.
- The data source is synced in the Bedrock knowledge base.
Despite these steps, the issue persists.
Does anyone have any insight into how to fix this issue?
Hello,
Thank you very much for your response. I apologize for not making the formatting of my question clearer. I did try removing the quotes around the "2". When I do this, the LLM chatbot responds with, "Sorry, I am unable to assist you with this request."
My hypothesis is that when I remove the quotes, Bedrock is not retrieving the documents with metadata where
case = 2
because when I set up the metadata fields in the vector index, I set the data type ofcase
to string. I would infer that 2 has a data type ofnumber
and notstring
, but maybe not in this case.Furthermore, please take a look at this documentation page: https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html Underneath the "logical operators" table and inside the "note" box, the documentation states, "You must surround strings with quotation marks."
If I set the data type of
case
to string, should I not put the quotation marks?Thank you for your help,
Jordan
I deleted the collection, index, and knowledge base. I changed the metadata json to the format:
{ "metadataAttributes": { "caseId": 2 } }
without the double quotes. When recreating the metadata fields in the index, I made sure to set thecase
data type tointeger
. I think this was the determining factor. Now when I set the metadata filter tocase = 2
when testing the knowledge base, the query returns a response with the correct filtered documents! Thank you!