Skip to content

How to Configure Metadata Filtering with Bedrock Knowledge Bases on OpenSearch Managed Cluster

10 minute read
Content level: Expert
0

Solve the "Rewrite first" error in Bedrock Knowledge Bases with OpenSearch managed clusters. Learn the correct index mapping structure for metadata filtering, including why custom fields need text type with keyword subfields

Bedrock Knowledge Bases enables you to implement RAG by connecting foundation models to your proprietary data. When using metadata filtering for multi-tenancy or content segmentation, customers often encounter cryptic errors when integrating with OpenSearch Service (managed cluster) as the vector store.

This guide addresses a common issue that is not yet deeply covered in current public documentation: the "failed to create query: Rewrite first" error that occurs when custom metadata fields are not properly structured for filtering operations. This article provides a comprehensive solution based on hands-on testing and validation with OpenSearch Service (managed cluster) version 2.13 and later.

Who will benefit from this guide:

  • Building multi-tenant RAG applications with metadata filtering
  • Designing Bedrock Knowledge Base architectures on OpenSearch
  • Troubleshooting "Rewrite first" errors during implementation

Example use case:

A common scenario is a multi-tenant RAG application where each document chunk includes metadata such as tenant_id, company_id, category, and document_type. Metadata filtering ensures that each user only retrieves content that belongs to their tenant and context, while still benefiting from vector similarity search.

Architecture Overview

The architecture involves:

User Query
    ↓
Bedrock Knowledge Base
    ↓
Metadata Filter Processing
    ↓
OpenSearch Managed Cluster (Vector Search + Metadata Filtering)
    ↓
Retrieved Chunks (filtered by metadata)
    ↓
Foundation Model (generates response)

The integration requires an OpenSearch managed cluster (v2.13 or higher) with k-NN plugin enabled, properly configured index with vector and metadata fields, IAM roles with appropriate permissions, and Fine-Grained Access Control configuration.

Common Error Messages

When metadata filtering is misconfigured, you'll run into one or more of these errors:

The "Rewrite first" Error

ValidationException: failed to create query: Rewrite first
(Service: BedrockAgentRuntime, Status Code: 400)

You'll see this during retrieve() or retrieveAndGenerate() API calls when using metadata filters.

Example failing request:

response = bedrock_runtime.retrieve(
    knowledgeBaseId='KB_ID',
    retrievalQuery={'text': 'query text'},
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'filter': {
                'equals': {
                    'key': 'company_id',
                    'value': '123'
                }
            }
        }
    }
)

Object Mapping Parse Error

object mapping for [AMAZON_BEDROCK_METADATA] tried to parse field 
[AMAZON_BEDROCK_METADATA] as object, but found a concrete value

This happens when AMAZON_BEDROCK_METADATA is incorrectly defined as type object instead of text.

Embedding Field Type Error

failed to create query: Field 'embedding' is not knn_vector type

You'll encounter this when dynamic mapping has converted the embedding field from knn_vector to float.

What Causes These Errors

OpenSearch's k-NN plugin requires text fields used in exact-match filters to have a .keyword subfield. When Bedrock processes metadata filters, it automatically appends .keyword to field names for exact matching. If a field is defined simply as type keyword without the nested subfield structure, Bedrock searches for field_name.keyword which doesn't exist, triggering the "Rewrite first" error. This error originates from OpenSearch's k-NN plugin behavior - when a k-NN search with a filter references a non-existent field, OpenSearch throws this cryptic message instead of a clear "field not found" error.

The second issue relates to how Bedrock stores its internal metadata. Bedrock serializes metadata as a JSON string in the AMAZON_BEDROCK_METADATA field rather than as a structured object. In other words, when this field is defined as type object with nested properties, OpenSearch rejects documents during indexing because it expects an object structure but receives a string.

The third issue stems from OpenSearch's dynamic mapping behavior. By default, OpenSearch can override explicit type definitions when new documents are indexed. What happens is that when an array of floats is indexed, OpenSearch may interpret it as type float instead of preserving the knn_vector type you defined, which completely breaks vector search functionality.

Correct Index Mapping

Here's a validated index mapping for OpenSearch managed cluster with Bedrock Knowledge Bases:

{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "dynamic": false,
    "properties": {
      "embedding": {
        "type": "knn_vector",
        "dimension": 1024,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "faiss",
          "parameters": {
            "ef_construction": 128,
            "m": 24
          }
        }
      },
      "AMAZON_BEDROCK_TEXT_CHUNK": {
        "type": "text",
        "index": true
      },
      "AMAZON_BEDROCK_METADATA": {
        "type": "text",
        "index": false
      },
      "company_id": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "tenant_id": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "category": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

Key Configuration Elements Setting "dynamic": false at the mapping level prevents OpenSearch from automatically adding new fields and ensures the knn_vector type is preserved during ingestion. Without this setting, the first document indexed can change your carefully defined field types.

Custom metadata fields need a specific structure. The primary type should be text for full-text search capability, with a nested keyword subfield that enables exact-match filtering:

"field_name": {
  "type": "text",
  "fields": {
    "keyword": {
      "type": "keyword",
      "ignore_above": 256
    }
  }
}

Bedrock automatically uses the .keyword subfield for filter operations. The ignore_above: 256 parameter prevents indexing extremely long values which could cause performance issues.

The vector field configuration requires careful attention. The dimension parameter must match your embedding model - use 1024 for Cohere Embed Multilingual v3 and Amazon Titan v2. The engine must be set to faiss as Bedrock requires this (using nmslib will cause validation errors).

Important: The AMAZON_BEDROCK_METADATA field must be defined as type text with index: false. Bedrock uses this field internally for document tracking and stores it as a JSON string, not as a structured object.

Implementation Steps

Prerequisites You'll need an OpenSearch managed cluster running version 2.13 or higher with Fine-Grained Access Control enabled. Your IAM role for Bedrock requires permissions including es:ESHttpGet, es:ESHttpPut, es:ESHttpPost, es:DescribeDomain, and bedrock:InvokeModel for your embedding model.

Configure Fine-Grained Access Control In OpenSearch Dashboards, add your Bedrock service role ARN to the "Backend roles" mapping for the all_access role.

Using the OpenSearch API:

curl -X PUT \
  "https://<OPENSEARCH-ENDPOINT>/_plugins/_security/api/rolesmapping/all_access" \
  -u "admin:<PASSWORD>" \
  -H "Content-Type: application/json" \
  -d '{
    "backend_roles": ["arn:aws:iam::<ACCOUNT-ID>:role/BedrockKnowledgeBaseRole"]
  }'

Create the OpenSearch Index

Navigate to OpenSearch Dashboards → Dev Tools and execute the mapping shown above using:

PUT /bedrock-kb-index

Before proceeding, verify the index was created correctly:

GET /bedrock-kb-index/_mapping

Confirm that embedding is type knn_vector, custom fields have nested keyword subfield, and dynamic is set to false.

Create the Knowledge Base Use the AWS CLI:

aws bedrock-agent create-knowledge-base \
  --name "my-knowledge-base" \
  --role-arn "arn:aws:iam::<ACCOUNT-ID>:role/BedrockKnowledgeBaseRole" \
  --knowledge-base-configuration '{
    "type": "VECTOR",
    "vectorKnowledgeBaseConfiguration": {
      "embeddingModelArn": "arn:aws:bedrock:<REGION>::foundation-model/cohere.embed-multilingual-v3"
    }
  }' \
  --storage-configuration '{
    "type": "OPENSEARCH_MANAGED_CLUSTER",
    "opensearchManagedClusterConfiguration": {
      "vectorIndexName": "bedrock-kb-index",
      "fieldMapping": {
        "vectorField": "embedding",
        "textField": "AMAZON_BEDROCK_TEXT_CHUNK",
        "metadataField": "AMAZON_BEDROCK_METADATA"
      }
    }
  }

Ingest Documents with Metadata

For S3 data sources, create a metadata file alongside your document with .metadata.json appended.

Example document.pdf.metadata.json:

{
  "metadataAttributes": {
    "company_id": "company-123",
    "tenant_id": "tenant-456",
    "category": "financial-reports"
  }
}

Testing the Solution

Start by verifying documents were ingested:

GET /bedrock-kb-index/_search?size=1

Test without filters first:

import boto3

bedrock_runtime = boto3.client('bedrock-agent-runtime')

response = bedrock_runtime.retrieve(
    knowledgeBaseId='<KB-ID>',
    retrievalQuery={'text': 'test query'},
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 5
        }
    }
)

Now test with a metadata filter - this should no longer produce the "Rewrite first" error:

response = bedrock_runtime.retrieve(
    knowledgeBaseId='<KB-ID>',
    retrievalQuery={'text': 'test query'},
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 5,
            'filter': {
                'equals': {
                    'key': 'company_id',
                    'value': 'company-123'
                }
            }
        }
    }
)

Test with multiple filters:

response = bedrock_runtime.retrieve(
    knowledgeBaseId='<KB-ID>',
    retrievalQuery={'text': 'financial data'},
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'filter': {
                'andAll': [
                    {'equals': {'key': 'company_id', 'value': 'company-123'}},
                    {'equals': {'key': 'category', 'value': 'financial-reports'}}
                ]
            }
        }
    }
)

Common Mistakes to Avoid

The most frequent mistake is defining custom fields as simple keyword type without the subfield structure. Bedrock expects to find a .keyword subfield when filtering, so always use the text type with a nested keyword field.

Another issue is forgetting "dynamic": false in your mapping. Without this, OpenSearch can automatically modify field types during ingestion, commonly causing the embedding field to convert from knn_vector to float.

The AMAZON_BEDROCK_METADATAfield causes confusion because it seems logical to define it as type object, but Bedrock actually stores this as a JSON string. Always use type text with index: false.

Using nmslib as the vector engine will cause validation errors because Bedrock requires faiss. Make sure your dimension parameter matches your embedding model - 1024 for Cohere Embed Multilingual v3 and Amazon Titan Embed Text v2.

Important: Always verify your mapping before creating the Knowledge Base. Use GET /index-name/_mapping to check that embedding is knn_vector, all custom fields have .keyword subfields, and AMAZON_BEDROCK_METADATA is text type. If any are wrong, delete the index and recreate it with the correct mapping - you cannot modify existing field types after creation.

Adding New Metadata Fields

You can add new fields to an existing index:

PUT /bedrock-kb-index/_mapping
{
  "properties": {
    "department": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    }
  }
}

New fields become immediately available for filtering, and you don't need to update your Knowledge Base configuration. However, you cannot modify existing field types or delete fields once created.

Best Practices

When implementing metadata filtering, plan your metadata schema before creating the index. Think through the fields you'll need upfront based on your use case - this minimizes mapping updates later. Stick to snake_case for naming, use descriptive names, and avoid reserved words like id or type. Set ignore_above values that match your expected data: 100 for short IDs, 256 for standard strings. Test your filters in development with realistic data volumes before deploying to production.

Troubleshooting

When the "Rewrite first" error persists after fixing your mapping, verify the mapping is actually correct with GET /bedrock-kb-index/_mapping. Check you're using the correct index name in your Knowledge Base configuration and try retrieving without filters first to isolate the issue.

If Knowledge Base creation fails with "embedding is not knn_vector type", delete the index and recreate it with "dynamic": false and the complete mapping in one operation. Verify afterward that embedding shows type knn_vector.

When metadata doesn't appear in query results, check it was actually ingested by examining a sample document. Verify field names match exactly - they're case-sensitive. If filters return all documents instead of filtered results, look for field name mismatches or test with a known value from an actual document in your index.

Key Takeways

Getting metadata filtering to work with Bedrock Knowledge Bases on OpenSearch managed cluster comes down to the field structure. Custom metadata fields must use the text + keyword subfield structure, set "dynamic: false" to prevent type conversion issues, and remember that AMAZON_BEDROCK_METADATA must be type text. Use the faiss engine, plan your schema upfront, and verify mappings before creating the Knowledge Base.

The index mapping provided here resolves the "Rewrite first" error that occurs with misconfigured metadata fields and has been validated through hands-on testing in real-world deployments.

For additional information, refer to the Bedrock Knowledge Bases Documentation, OpenSearch k-NN Documentation , and the blog post Bedrock Knowledge Bases with OpenSearch Managed Cluster

AWS
SUPPORT ENGINEER
published 11 days ago202 views