- Newest
- Most votes
- Most comments
The relevancy degradation you're experiencing with filtered queries in S3 Vectors is a known challenge in vector search systems. When combining vector similarity search with SQL-style filters, the filtering can create what's known as "overfiltering," where the system retrieves vectors based on similarity first, then applies filters afterward. This can lead to fewer results than expected or results that don't represent the best matches from your filtered subset.
The issue you've identified—where filters remove "bridge" nodes in the HNSW graph structure—is particularly relevant. HNSW creates a multi-layered graph where vectors connect to their nearest neighbors, and when filters eliminate key connecting nodes, the traversal algorithm can indeed get trapped in suboptimal regions of the vector space, leading to lower relevancy scores.
For S3 Vectors specifically, the architecture is optimized for cost-effective storage of large vector datasets with sub-second query performance, making it ideal for infrequent query workloads rather than ultra-low latency scenarios. The service is designed to reduce costs by up to 90% compared to alternative solutions, but this comes with architectural trade-offs around how filtering and retrieval are handled.
When implementing filtered vector searches, consider these approaches:
Query Configuration: S3 Vectors supports metadata filtering through the query_vectors API, where you can specify filter conditions alongside your vector query and topK parameters. The system returns distance values that can be converted to similarity scores for ranking.
Metadata Strategy: S3 Vectors supports up to 40 KB of metadata per vector with schema-less filtering capabilities. Optimizing your metadata structure and filter design can help improve result quality.
Performance Tuning: Experiment with different topK values when applying filters—requesting more results than needed and then post-processing may help compensate for relevancy drops, though this increases query costs.
Regarding your specific feature request for filter-aware HNSW traversal, this type of optimization would need to balance the architectural constraints of object storage access patterns with the benefits of maintaining graph connectivity. The challenge, as you've noted, is that S3-based architectures have different cost models than in-memory systems, where each additional hop translates to storage read operations.
For your multimodal use case with primarily image content, you might also consider whether your embedding model is optimally suited for the task. Unified multimodal embedding models that process text and images into the same vector space can help ensure consistent similarity scoring across modalities, which may partially address the relevancy issues you're seeing with filtered queries.
Your 1.25x static boost workaround is a pragmatic solution, though ideally the system would handle this natively. Continue monitoring your relevancy metrics and consider A/B testing different filter strategies and topK configurations to find the optimal balance for your specific use case.
Sources
Building enterprise-scale RAG applications with Amazon S3 Vectors and DeepSeek R1 on Amazon SageMaker AI | Artificial Intelligence
Supercharging vector search performance and relevance with pgvector 0.8.0 on Amazon Aurora PostgreSQL | AWS Database Blog
Crossmodal search with Amazon Nova Multimodal Embeddings | Artificial Intelligence
Relevant content
- asked 7 months ago
- AWS OFFICIALUpdated 9 months ago
