Skip to content

Using `STRING_LIST` metadata with S3 Vectors + Bedrock Knowledge Bases

0

According to this doc, S3 Vectors supports string, number, boolean, and list metadata types. However, when using the list metadata type, I am encountering an error when ingesting into a Bedrock Knowledge Base -

Encountered error: Ignored 10 files due to invalid metadata attributes. Check that the attribute keys and values don't exceed the character quota, and that the attribute values are acceptable data types (strings, numbers, or Booleans). Then retry your request

I've created my vector bucket and index as follows -

import boto3
client = boto3.client("s3vectors")
client.create_vector_bucket(vectorBucketName="bucket-name")
client.create_index(vectorBucketName="bucket-name", indexName="index-name", dataType="float32", dimension=1024, distanceMetric="euclidean", metadataConfiguration={"nonFilterableMetadataKeys": ["AMAZON_BEDROCK_TEXT", "AMAZON_BEDROCK_METADATA"]})

And the files in my S3 bucket data source each have a filename.metadata.json file like the following -

{
    "metadataAttributes": {
        "key": {
            "value": {
                "type": "STRING_LIST",
                "stringListValue": ["tag"],
            },
            "includeForEmbedding": false,
        },
    },
}

But when I ingest this data source into my Bedrock Knowledge Base, I get the above error. However, when I use the s3vectors-embed CLI, it works -

s3vectors-embed put --vector-bucket-name bucket-name --index-name index-name --model-id cohere.embed-english-v3 --text-value "Test" --metadata '{"key": ["tag"]}'

So I don't believe it's a problem with the metadata, rather, it seems to be an issue with the way the Bedrock ingest command works. Am I missing something here?

5 Answers
1

Hi, I'd like to bump this issue. I created my index like this:

aws s3vectors create-index \
  --vector-bucket-name "s3v2" \
  --index-name "custom-chunking" \
  --data-type "float32" \
  --dimension 1024 \
  --distance-metric "cosine" \
  --metadata-configuration '{"nonFilterableMetadataKeys":["AMAZON_BEDROCK_TEXT", "AMAZON_BEDROCK_METADATA"]}'

And I am having a similar issue when trying to setup a knowledge base using S3 Vectors.

  "metadataAttributes": {
    "topics": ["lalalala", "lololo"],
    "id": 730,
    "start_year": 1990
  }
}

Throws the error: Encountered error: Ignored 1 files due to invalid metadata attributes. Check that the attribute keys and values don't exceed the character quota, and that the attribute values are acceptable data types (strings, numbers, or Booleans). Then retry your request [Files: s3://xx/xx/730-1.txt]. Call to Customer Source did not succeed.

I have also tried a more typed metadata file:

{
  "metadataAttributes": {
    "topics": {
      "type": "STRING_LIST",
      "stringListValue": ["lalala", "lololo"]
    },
    "id": {
      "type": "NUMBER",
      "numberValue": 730
    },
    "start_year": {
      "type": "NUMBER",
      "numberValue": 1990
    }
  }
}

In my opinion, it is most likely related to the STRING_LIST present in the metadata, as turning it into a single string or deleting it altogether makes the error disappear. Is there maybe some guide as to how to format lists within metadata for S3 Vectors?

Thank you in advance.

answered 8 months ago
0

Experiencing the exact same issue here -- and I cannot find any quality examples where STRING_LIST is used. Is STRING_LIST actually supported for bedrock knowledgebases? Please supply the correct documentation here -- we shouldn't have to reverse engineer the behavior here -- either it's supported or it isn't.

answered 8 months ago
  • According to this doc, S3 Vectors supports list type metadata.

    S3 Vectors supports string, number, boolean, and list types of metadata with a size limit per vector.

    Bedrock supports list type metadata filtering too, according to this doc - there exist in and notIn operators for list type filtering.

    It works when using Opensearch as a backend, so it's not a problem with Bedrock. It's a problem with S3 Vectors.

0

Hello, Actually, the AWS Knowledge Base service with S3 Vectors does not support the STRING_LIST data type. However, if you use another vector store such as PostgreSQL, STRING_LIST works perfectly.

answered 7 months ago
0

This issue seems to have been addressed recently as it's now working for me. Maybe someone else can confirm?

answered 7 months ago
-1

Based on the information provided, there appears to be a discrepancy between how S3 Vectors handles metadata types directly versus how Bedrock Knowledge Base processes them during ingestion.

While S3 Vectors does support STRING_LIST as a metadata type (along with STRING, NUMBER, and BOOLEAN), there seems to be an issue with the format you're using in your metadata JSON file for Bedrock Knowledge Base ingestion.

The error message indicates that Bedrock Knowledge Base might be expecting a different format for metadata attributes. When using the MetadataAttributeValue structure with Bedrock, the STRING_LIST type requires specific formatting. For STRING_LIST type, you should use the stringListValue field, which is an array of strings with a minimum of 1 item and a maximum of 10 items. Each string in the array has length constraints (minimum 1, maximum 2048).

Your current metadata JSON structure might not be compatible with what Bedrock Knowledge Base expects. The CLI command works because it's directly interfacing with S3 Vectors, while the Bedrock ingestion process has its own requirements for metadata formatting.

Try modifying your metadata JSON file to ensure it follows the exact structure expected by Bedrock Knowledge Base. You might need to adjust how you're specifying the STRING_LIST type and its values to match Bedrock's requirements.

Also, verify that your metadata attributes don't exceed any character quotas and that the values are properly formatted for their respective types. For STRING_LIST specifically, ensure you're not exceeding the limit of 10 items per list and that each string is within the allowed length constraints.
Sources
MetadataAttributeValue - Amazon Bedrock
MetadataAttributeSchema - Amazon Bedrock
MetadataAttributeSchema - Amazon Bedrock
Metadata filtering - Amazon Simple Storage Service

answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.