How do I improve search results and retrieval accuracy in Amazon Bedrock Knowledge Bases?

3 minute read
0

I want to improve the search result accuracy in my Amazon Bedrock knowledge bases.

Resolution

Use foundation models to parse documents

When the documents are complex, unstructured, or contain domain-specific terminology, it's a best practice to use foundation models to parse documents. Foundation models improve the retrieval of complex data within documents such as nested tables, text within images, and graphical representations of text. To customize how the foundation model parses your documents, provide instructions based on your document structure, domain, or use case.

Use advanced chunking strategies

Use semantic chunking or hierarchical chunking to improve Retrieval Augmented Generation (RAG) performance.

Use semantic chunking for documents without clear contextual boundaries, such as legal documents or technical manuals. Semantic chunking provides more precise information extraction and manipulation.

Note: You incur additional costs when you use semantic chunking. The cost depends on how much data you have. For information about pricing, see Amazon Bedrock pricing.

Use hierarchical chunking for complex documents with a nested structure, such as technical documents, or academic papers with complex formatting and nested tables. Hierarchical chunking allows you to effectively retrieve and navigate a large document. Use foundation models to parse data first, and then use hierarchical chunking to improve the accuracy of generated responses.

To customize the chunking process to align with your RAG application requirements, use a custom AWS Lambda function.

Filter your metadata

Use .csv files to include metadata in a data source. To reduce the number of required files and improve your data management, use columns to designate content fields and metadata fields. It's a best practice to use this feature for large .csv file data sets.

Add filters to document fields or attributes to improve the relevancy of responses. Your data sources can include document metadata attributes or fields to filter and specify the fields to embed. For more information, see Amazon Bedrock Knowledge Bases now supports metadata filtering to improve retrieval accuracy.

Customize your queries

Modify a complex query into smaller, more manageable sub-queries. When you use query decomposition, Amazon Bedrock runs multiple queries on your knowledge base. To modify your query, see the Query modifications tab on Configure and customize queries and response generation.

By default, Amazon Bedrock returns up to five results that correspond to a source chunk when you query a knowledge base. To improve your search results, increase the number of source chunks that Amazon Bedrock returns. To increase the number of source chunks, see the Number of source chunks tab on Configure and customize queries and response generation.

Use hybrid search

For multi-search algorithm capabilities, use hybrid search. Semantic search provides answers based on the meaning of the text. However, it can't capture all the relevant keywords and relies on the quality of the words that you embed to represent the text meaning. Hybrid search combines semantic search with keyword search to improve search results.

Use reranker models

Use reranker models to improve the relevance of the results that Amazon Bedrock retrieves.