Skip to content

S3 Vector, big metadata error

1

Hello,

When I've tried to use metadata like below, also considering this question:

s3vectors-embed put --vector-bucket-name test-vector-bucket --index-name titan-vector-index --model-id amazon.titan-embed-text-v2:0 --text "\test.json"  --metadata "{\"source\": \"huggingface\"}"

I have the below error:

Error: Failed to process text input: S3 Vectors put_vectors failed: An error occurred (ValidationException) when calling the PutVectors operation: Invalid record for key '62d92fcd-1de2-4341-9058-28776bc287b2': Filterable metadata must have at most 2048 bytes

while the syntax is like here, with a longer metadata:

s3vectors-embed put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id amazon.titan-embed-text-v2:0 \
  --text "s3://my-bucket/sample.txt"
  --metadata '{"category": "technology", "version": "1.0"}'

UPDATE:
Same metadata error appears when there is no metadata specified:

s3vectors-embed put --vector-bucket-name test-vector-bucket --index-name titan-vector-index --model-id amazon.titan-embed-text-v2:0 --text "\test.json"

When I ran the query:

s3vectors-embed query --vector-bucket-name test-vector-bucket --index-name titan-vector-index --model-id amazon.titan-embed-text-v2:0 --query-input "query text"  --k 10

The successful output contains as metadata, the text added as --text-value (some related info here):

{
  "results": [
    {
      "key": "2b323e7f-bc4f-4401-bb9a-967963feab53",
      "metadata": {
        "S3VECTORS-EMBED-SRC-CONTENT": "Hello, mr. anderson !"
      }
    }
  ],
  "summary": {
    "queryType": "text",
    "model": "amazon.titan-embed-text-v2:0",
    "index": "titan-vector-index",
    "resultsFound": 1,
    "queryDimensions": 1024
  }
}

Thank you,

  • The same error occurred to me, and I fixed it by adding a key to the metadata configurations of the vector index. I don't know if it's relevant or not, but I gave the key the same name as the index.

  • Were you able to solve this issue? I'm having a similar issue using S3 Vector database, except my files don't have metadata associated with them, just the text file. I am using the S3 bucket upload feature to upload the files and test a datasource for a knowledgebase.

  • Hey all, check out https://github.com/awslabs/s3vectors-embed-cli/pull/9 - trying to squash this bug, this addresses the root cause. Thanks!

5 Answers
0
  • In S3 Vectors, by default, all metadata keys are considered filterable. This means that while performing a query on the index, you can specify filters to only return specific data. According to the S3 Vectors Limits docs, Filterable metadata per vector is up to 2 KB (2048 bytes). Additionally, total metadata per vector, filterable + non-filterable, is up to 40 KB per vector.

  • To resolve the issue, create a new vector index, and include AMAZON_BEDROCK_TEXT and AMAZON_BEDROCK_METADATA as nonFilterableMetadataKeys. This way, the you can store up to 40kb of metadata per vector.

sample:

metadataConfiguration={
        'nonFilterableMetadataKeys': [
            'AMAZON_BEDROCK_TEXT',                  <<<< This shows how keys are based as nonFilterable
            'AMAZON_BEDROCK_METADATA'
        ]
    }
AWS
SUPPORT ENGINEER
answered a year ago
  • Thank you for your answer,
    While the error appears also when no metadata is used, the increase of metadata space did not have any effect, and I do still have same error when tried to load data from a file.

  • Hello,
    A thing to notice is that below syntax works:

    s3vectors-embed put --vector-bucket-name "test-vector-bucket" --index-name "titan-vector-index-2" --model-id "amazon.titan-embed-text-v2:0"   
    --text-value "Hello Mr Andersen !"  
    --metadata '{"category": "test", "source": "huggingface", "tasks": "sentence similarity", "languages": "english", "modalities": "text"}'
    

    while this one does not, and I have the same error (Filterable metadata must have at most 2048 bytes):

    s3vectors-embed put --vector-bucket-name "test-vector-bucket" --index-name "titan-vector-index-2" --model-id "amazon.titan-embed-text-v2:0" 
    --text "\test.json"  
    --metadata '{"category": "test", "source": "huggingface", "tasks": "sentence similarity", "languages": "english", "modalities": "text"}'
    
  • I tried the fix, but it seems to be decrementing the agent's performance in retrieving the information. Do you know if there is a fix that may not reduce the performance? I am using Bedrock data automation as a parsing strategy.

0

The error you're encountering is related to the size limit for filterable metadata in S3 Vectors. The error message "Filterable metadata must have at most 2048 bytes" indicates that the metadata you're trying to attach to your vector exceeds the maximum allowed size for filterable metadata.

By default, all metadata in S3 Vectors is filterable unless explicitly specified as non-filterable during vector index creation. Filterable metadata has a size limit of 2048 bytes per vector. Even though your metadata example {"source": "huggingface"} appears small, there might be an issue with how it's being processed or encoded.

Here are some potential solutions:

  1. Make sure your metadata JSON is properly formatted. In your command, you're using escaped quotes (\") which might be causing issues. Try using single quotes around the entire JSON string and double quotes inside:
--metadata '{"source": "huggingface"}'
  1. If you need to store larger metadata, consider using non-filterable metadata keys. These must be specified during vector index creation and can't be changed later. Non-filterable metadata keys are ideal for storing larger amounts of data that don't need to be searchable but can be returned with query results.

  2. Check if there are any hidden characters or encoding issues in your metadata string that might be increasing its byte size.

  3. Verify that the total size of your metadata (including keys and values) doesn't exceed the 2048-byte limit for filterable metadata.

Remember that S3 Vectors supports string, number, boolean, and list types of metadata, and the metadata is ideal for attributes that you want to filter on, such as categories, timestamps, or status values.
Sources
Metadata filtering - Amazon Simple Storage Service
Vectors - Amazon Simple Storage Service
Vector indexes - Amazon Simple Storage Service

answered a year ago
0

I am using AWS bedrock and configured the s3 vector.

when I am trying to sync the document + metadata (startingestionjob & documentingestion) its not working. ValidationException: Invalid record for key 'someid': Filterable metadata must have at most 2048 bytes

my metadata.json file has this below content: {"metadataAttributes": {"source": "huggingface"}}

I have enabled cloudtrail and logging on bedrock but RCA can't be found out. even after that I am getting same issue.

answered a year ago
0

Hello,
A thing to notice is that below syntax works:

s3vectors-embed put --vector-bucket-name "test-vector-bucket" --index-name "titan-vector-index-2" --model-id "amazon.titan-embed-text-v2:0"   
--text-value "Hello Mr Andersen !"  
--metadata '{"category": "test", "source": "huggingface", "tasks": "sentence similarity", "languages": "english", "modalities": "text"}'

while this one does not, and I have the same error (Filterable metadata must have at most 2048 bytes):

s3vectors-embed put --vector-bucket-name "test-vector-bucket" --index-name "titan-vector-index-2" --model-id "amazon.titan-embed-text-v2:0" 
--text "\test.json"  
--metadata '{"category": "test", "source": "huggingface", "tasks": "sentence similarity", "languages": "english", "modalities": "text"}'

The file test.json has 3.5 KB and is made of similar lines with this one:

{"query": "Does it work with t-mobile", "pos": [" I DONT RECOMMEND THIS PHONE!!!!"]}

When the size of the file is 85 B, the second syntax also works ! So, it seams that the file with TEXT is considered file with metadata ! Or the text is also added as metadata !

UPDATE:
The main cause for the error seams to be the index feature: each index key can't hold a text bigger than 2048 bytes

answered 10 months ago
0

Hey all, check out https://github.com/awslabs/s3vectors-embed-cli/pull/9 - trying to squash this bug, this addresses the root cause. Thanks!

answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.