Skip to content

how to bulk ingest vector/embeddings via boto3 to a s3 vector bucket index?

0

I am testing with s3 vector bucket , based on some examples like shown below, we can ingest multiple items in an index created under s3 vector bucket. my question is , what is size/number of items we can push to an index at once using boto3 put_vectors call?

import boto3

s3vectors_client = boto3.client('s3vectors')

vectors_data = [
    {"key": "item1", "vector": {"float32": [0.1, 0.2, 0.3]}, "metadata": {"title": "Product A"}},
    {"key": "item2", "vector": {"float32": [0.4, 0.5, 0.6]}, "metadata": {"title": "Product B"}}
]

response = s3vectors_client.put_vectors(
    BucketName='your-vector-bucket-name',
    IndexName='your-vector-index-name',
    Vectors=vectors_data
)

print(response)
asked 9 months ago326 views
2 Answers
0

To bulk ingest vectors/embeddings into an S3 vector index using boto3, you can use the put_vectors API operation. This method allows you to insert multiple vectors in a single call, which is recommended for maximizing write throughput.

When using the put_vectors method, there are a few corrections needed to your example code:

  1. The parameter should be vectorBucketName instead of BucketName
  2. The vector data should be specified as data instead of vector
  3. Each vector consists of a key, data (as float32 values), and optional metadata

Here's the corrected code:

import boto3

s3vectors_client = boto3.client('s3vectors')

vectors_data = [
{"key": "item1", "data": {"float32": [0.1, 0.2, 0.3]}, "metadata": {"title": "Product A"}},
{"key": "item2", "data": {"float32": [0.4, 0.5, 0.6]}, "metadata": {"title": "Product B"}}
]

response = s3vectors_client.put_vectors(
vectorBucketName='your-vector-bucket-name',
indexName='your-vector-index-name',
vectors=vectors_data
)

print(response)

Regarding the maximum batch size for put_vectors, AWS recommends inserting vectors in large batches to maximize throughput, up to the maximum batch size limit. However, the specific limit for the number of vectors per put_vectors API call is not explicitly stated in the provided information. This would be documented in the "Limitations and restrictions" section of the S3 Vectors documentation.

When inserting vectors, keep in mind:

  • Each vector must have a unique key within the index
  • Inserting a vector with an existing key will overwrite the previous vector
  • You can attach metadata as key-value pairs to each vector
  • S3 vector indexes support string, number, boolean, and list types of metadata
  • There are limits on the total metadata size per vector and filterable metadata size per vector

For optimal performance, it's recommended to batch your vectors when inserting them and to follow S3 Vectors best practices.
Sources
Inserting vectors into a vector index - Amazon Simple Storage Service
Tutorial: Getting started with S3 Vectors - Amazon Simple Storage Service
Introducing Amazon S3 Vectors: First cloud storage with native vector support at scale (preview) | AWS News Blog
S3Vectors - Boto3 1.39.5 documentation

answered 9 months ago
0

Hello.

The number of vectors in one put_vectors API seems to be 500.
https://docs.aws.amazon.com/AmazonS3/latest/API/API_S3VectorBuckets_PutVectors.html

Array Members: Minimum number of 1 item. Maximum number of 500 items.

Other limitations are described in the following documents:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-limitations.html

EXPERT
answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.