Skip to content

Multi-Valued Metadata Attributes with Bedrock Knowledge Base using Aurora Serverless

0

I'm using Amazon Bedrock with Aurora Serverless as the vector store for a knowledge base. I've successfully ingested documents with single-valued metadata fields (using "stringValue": "..."), but I'm running into an error when attempting to include an array field using "stringListValue": [ "val1", "val2" ]. The ingestion job fails with the message:

Encountered invalid metadata attributes/fields. Check that the attribute keys and values
don't exceed the character quota, and that the attribute values are acceptable data types
(strings, numbers, or Booleans), then retry

Current Implementation:

  1. Aurora Serverless table has a column of type text[], e.g. env text[].
  2. Sample ingestion metadata (on S3):
    {
      "metadataAttributes": {
        "env": {
          "value": {
            "stringListValue": ["dev", "stage"],
            "type": "STRING_LIST"
          },
          "includeForEmbedding": true
        }
      }
    }
  3. Other simple string metadata fields work, but when including "stringListValue" for env, the ingestion fails with "RESOURCE_IGNORED".

Is stringListValue (multi-valued metadata) officially supported in Bedrock Knowledge Bases that use Aurora Serverless as the vector store? If so, how should I configure the ingestion request and/or Aurora schema to accept and properly index an array attribute? If not, what is the recommended workaround for storing and querying multiple values in a single metadata field?.

Looking at the Boto3 Documentation, it appears that the MetadataAttributeValue can be a stringListValue. I think that the knowledge base setup docs should be clearer on the differences between the different vector database options.

1 Answer
0
  1. Aurora Serverless and Array Handling Aurora Serverless does support array types like text[] in PostgreSQL, which should theoretically handle multi-valued metadata attributes like stringListValue. However, there may be a problem with how you're configuring the ingestion request or how Bedrock interacts with Aurora Serverless in this context.

  2. Understanding the Error The error message RESOURCE_IGNORED is generally indicative of an issue with how metadata attributes are formatted or structured, especially with arrays. It suggests that the ingestion process is not recognizing or properly interpreting the multi-valued attribute.

  3. Data Types and Compatibility In the metadata structure you're sending to Amazon Bedrock, you're using stringListValue for multi-valued metadata. This is valid in the MetadataAttributes for some vector databases, but Aurora Serverless might expect the values in a different format (e.g., a string representation of an array or a single concatenated string).

  4. Recommended Workaround Since multi-valued metadata (like stringListValue) is supported in Amazon Bedrock but may need special handling for Aurora Serverless, here’s a potential workaround:

a) Flatten the Array Instead of sending the array as a stringListValue, you could try flattening the array into a single string. This can be done by joining the values in the array into a single string, with a separator (e.g., comma, semicolon), and then storing that string in the Aurora database.

For example, instead of sending:

"metadataAttributes": { "env": { "value": { "stringListValue": ["dev", "stage"], "type": "STRING_LIST" }, "includeForEmbedding": true } } You could try:

"metadataAttributes": { "env": { "value": { "stringValue": "dev,stage", "type": "STRING" }, "includeForEmbedding": true } } Then, in your Aurora Serverless schema, you can store this as a TEXT or VARCHAR field (e.g., env VARCHAR). This approach would allow you to store multiple values in a single field and then later split the string by the separator when needed for querying.

b) Use a Custom Encoding for Lists If you want to keep the array format, you could encode the array into a string format that can be stored in a TEXT field. This could involve serializing the array into a JSON string, for instance:

"metadataAttributes": { "env": { "value": { "stringValue": "{"values": ["dev", "stage"]}", "type": "STRING" }, "includeForEmbedding": true } } In your Aurora schema, you could store this as a TEXT field. When querying or retrieving the metadata, you could parse the JSON string back into an array using the jsonb functions provided by PostgreSQL.

c) Ensure Column Type Compatibility Make sure that the column env in your Aurora schema is of a suitable type. If you're trying to store arrays directly (e.g., text[] in PostgreSQL), ensure the metadata ingestion process is also expecting that type. If necessary, switch to storing a single string (as in the examples above) rather than an array type in the database.

regards, M Zubair https://zeonedge.com

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.