Indexing of metadata in Opensearch Serverless.

0

How are the metadata and the chunks from a document being stored and used by opensearch. Do the chunks of data after chunking and the metadata stored in a single file or are they stored in different files. I read in some places that sometimes they are stored as document stores where, each chunk is associated with a metadata in the same files. In others I have read that opensearch stores them in 2 different files and somehow uses some different technique to use the metadata to retrieve the correct chunk. Also if I change only the metadata file only then is this chunking and indexing of the main file also re done.

1 Answer
0
Accepted Answer

In Amazon OpenSearch Serverless, the indexing process is handled automatically by the service's architecture, which separates indexing and search components for improved scalability and performance.

When data is ingested into an OpenSearch Serverless collection, it is distributed to indexing compute units. These units process the incoming data, including both the document content and its associated metadata. The indexed data, which includes both the document chunks and metadata, is then stored in Amazon S3 as the primary storage.

OpenSearch Serverless does not explicitly separate metadata and document chunks into different files. Instead, it uses a unified approach where each document, including its content and metadata, is indexed and stored together. This allows for efficient retrieval and querying of both the document content and its associated metadata.

The indexing process in OpenSearch Serverless is designed to be automatic and optimized for performance. When you make changes to a document, including updates to metadata, the system will handle the necessary re-indexing. However, it's important to note that OpenSearch Serverless manages this process internally, and users don't have direct control over how individual chunks or metadata are stored or updated.

Regarding your question about changing only the metadata: In OpenSearch Serverless, you typically update documents as a whole rather than just metadata. When you update a document, including its metadata, the system will handle the necessary re-indexing of that document. This ensures that both the content and metadata remain consistent and searchable.

It's worth noting that OpenSearch Serverless uses different storage strategies depending on the collection type (time series, search, or vector search). For example, time series collections use a combination of hot and warm storage, while search and vector search collections keep all data in hot storage for faster query response times.

Overall, OpenSearch Serverless abstracts away many of the low-level details of indexing and storage, focusing on providing a scalable and efficient search solution without requiring manual management of these processes.

Sources
What is Amazon OpenSearch Serverless? - Amazon OpenSearch Service
How is the indexing process taken care of in opensearch serverless? | AWS re:Post
How does OpenSearch Serverless Collection determine which index strategy/index type to use while data is being ingested to one of the collections? | AWS re:Post

profile picture
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions