Skip to content

Filtered query relevancy degradation in S3 Vectors -- and a potential architectural fix

0

Filtered Query Relevancy Issues in Multimodal RAG Pipelines

I've been running a multimodal RAG pipeline on S3 Vectors (~1,500 documents, ~60% images) and found that filtered queries consistently return results scoring ~10% lower in relevancy than unfiltered queries for the same content. In some cases, filters return the wrong results entirely: visually similar but incorrect matches.

Analysis & Findings

The issue appears to be a combination of two factors:

  1. Quantization Noise: 4-bit quantization noise becomes proportionally larger as the candidate pool shrinks.
  2. HNSW Graph Disconnection: When filters remove "bridge" nodes, the traversal algorithm loses necessary paths, often getting trapped in local minima.

Note: Reranking was ineffective here as the corpus is primarily images with metadata; standard cross-encoders aren't optimized for this data type. We are currently using a 1.25x static boost on filtered scores in production as a workaround.

The Challenge with S3-Based Architectures

While ACORN (two-hop neighbor expansion) addresses this for in-memory stores, it assumes cheap random access. Because the S3 Vectors graph resides on object storage, every additional hop translates to an S3 read, making standard two-hop expansion prohibitively expensive.


Feature Request: Filter-Aware HNSW Traversal

Would the S3 Vectors team consider implementing filter-aware HNSW traversal?

  • The Proposal: Allow filtered nodes to be used for graph navigation even if they are excluded from the final results.
  • The Benefit: This maintains graph connectivity and prevents traversal from failing when filters remove critical bridge nodes.
  • Cost Impact: The cost model should remain stable as filtered nodes are traversed but not returned.

The primary challenge is adapting this for object storage access patterns where in-memory assumptions don't apply. Is this approach currently on the roadmap, or are others in the community seeing similar relevancy drops when applying filters?

asked 3 months ago69 views
1 Answer
0

The relevancy degradation you're experiencing with filtered queries in S3 Vectors is a known challenge in vector search systems. When combining vector similarity search with SQL-style filters, the filtering can create what's known as "overfiltering," where the system retrieves vectors based on similarity first, then applies filters afterward. This can lead to fewer results than expected or results that don't represent the best matches from your filtered subset.

The issue you've identified—where filters remove "bridge" nodes in the HNSW graph structure—is particularly relevant. HNSW creates a multi-layered graph where vectors connect to their nearest neighbors, and when filters eliminate key connecting nodes, the traversal algorithm can indeed get trapped in suboptimal regions of the vector space, leading to lower relevancy scores.

For S3 Vectors specifically, the architecture is optimized for cost-effective storage of large vector datasets with sub-second query performance, making it ideal for infrequent query workloads rather than ultra-low latency scenarios. The service is designed to reduce costs by up to 90% compared to alternative solutions, but this comes with architectural trade-offs around how filtering and retrieval are handled.

When implementing filtered vector searches, consider these approaches:

Query Configuration: S3 Vectors supports metadata filtering through the query_vectors API, where you can specify filter conditions alongside your vector query and topK parameters. The system returns distance values that can be converted to similarity scores for ranking.

Metadata Strategy: S3 Vectors supports up to 40 KB of metadata per vector with schema-less filtering capabilities. Optimizing your metadata structure and filter design can help improve result quality.

Performance Tuning: Experiment with different topK values when applying filters—requesting more results than needed and then post-processing may help compensate for relevancy drops, though this increases query costs.

Regarding your specific feature request for filter-aware HNSW traversal, this type of optimization would need to balance the architectural constraints of object storage access patterns with the benefits of maintaining graph connectivity. The challenge, as you've noted, is that S3-based architectures have different cost models than in-memory systems, where each additional hop translates to storage read operations.

For your multimodal use case with primarily image content, you might also consider whether your embedding model is optimally suited for the task. Unified multimodal embedding models that process text and images into the same vector space can help ensure consistent similarity scoring across modalities, which may partially address the relevancy issues you're seeing with filtered queries.

Your 1.25x static boost workaround is a pragmatic solution, though ideally the system would handle this natively. Continue monitoring your relevancy metrics and consider A/B testing different filter strategies and topK configurations to find the optimal balance for your specific use case.
Sources
Building enterprise-scale RAG applications with Amazon S3 Vectors and DeepSeek R1 on Amazon SageMaker AI | Artificial Intelligence
Supercharging vector search performance and relevance with pgvector 0.8.0 on Amazon Aurora PostgreSQL | AWS Database Blog
Crossmodal search with Amazon Nova Multimodal Embeddings | Artificial Intelligence

answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.