- Newest
- Most votes
- Most comments
In Amazon OpenSearch Serverless, the indexing process is handled automatically by the service's architecture, which separates indexing and search components for improved scalability and performance.
When data is ingested into an OpenSearch Serverless collection, it is distributed to indexing compute units. These units process the incoming data, including both the document content and its associated metadata. The indexed data, which includes both the document chunks and metadata, is then stored in Amazon S3 as the primary storage.
OpenSearch Serverless does not explicitly separate metadata and document chunks into different files. Instead, it uses a unified approach where each document, including its content and metadata, is indexed and stored together. This allows for efficient retrieval and querying of both the document content and its associated metadata.
The indexing process in OpenSearch Serverless is designed to be automatic and optimized for performance. When you make changes to a document, including updates to metadata, the system will handle the necessary re-indexing. However, it's important to note that OpenSearch Serverless manages this process internally, and users don't have direct control over how individual chunks or metadata are stored or updated.
Regarding your question about changing only the metadata: In OpenSearch Serverless, you typically update documents as a whole rather than just metadata. When you update a document, including its metadata, the system will handle the necessary re-indexing of that document. This ensures that both the content and metadata remain consistent and searchable.
It's worth noting that OpenSearch Serverless uses different storage strategies depending on the collection type (time series, search, or vector search). For example, time series collections use a combination of hot and warm storage, while search and vector search collections keep all data in hot storage for faster query response times.
Overall, OpenSearch Serverless abstracts away many of the low-level details of indexing and storage, focusing on providing a scalable and efficient search solution without requiring manual management of these processes.
Sources
What is Amazon OpenSearch Serverless? - Amazon OpenSearch Service
How is the indexing process taken care of in opensearch serverless? | AWS re:Post
How does OpenSearch Serverless Collection determine which index strategy/index type to use while data is being ingested to one of the collections? | AWS re:Post