跳至內容

AWS Kendra Retrieve API filter not working

0

Hello,

This is how I am calling the Kendra Retrieve API:

response = client.retrieve(
            IndexId=KENDRA_INDEX_ID,
            QueryText=query,
            PageSize=page_size,
            PageNumber=page_number,
            # AttributeFilter={
            #     'ContainsAny': {
            #         'Key': '_source_uri',
            #         'Value': {'StringListValue': ['client-138']}
            #     }
            # }
        )

As per code snapshot above, this work but I get results only without AttributeFilter. I tried multiple alternatives for using the AttributeFilter and it never returns any results. I am trying to filter only files which have 'client-138' in their URI, file path for example is "s3://production-bucket/client-138/xyz.pdf".

For key I have also tried 'DocumentURI', 'source_uri' etc. but no search results when I have AttributeFilter.

Thank you.

已提問 6 個月前檢視次數 112 次
3 個答案
0

It seems you're encountering an issue with the AttributeFilter in your Kendra query. There are a few things to consider:

  1. API Difference: You're using the Retrieve API, but the AttributeFilter is typically used with the Query API. The Retrieve API has different parameters and functionality compared to the Query API.

  2. Attribute Key: The correct attribute key for the document URI is typically '_source_uri' or 'SourceURI', so your approach there is correct.

  3. Filter Operation: Instead of 'ContainsAny', you might want to use 'Contains' since you're looking for a specific string within the URI.

  4. Value Type: For a single string value, use 'StringValue' instead of 'StringListValue'.

Here's a suggested modification to your code, using the Query API instead of Retrieve:

response = client.query(
    IndexId=KENDRA_INDEX_ID,
    QueryText=query,
    PageSize=page_size,
    PageNumber=page_number,
    AttributeFilter={
        'Contains': {
            'Key': '_source_uri',
            'Value': {'StringValue': 'client-138'}
        }
    }
)

If you specifically need to use the Retrieve API, note that it might not support the same filtering capabilities as the Query API. In that case, you may need to perform the filtering on the client side after retrieving the results.

Also, ensure that the documents in your index have the correct metadata for the '_source_uri' field and that it contains the value you're filtering for. If the filtering still doesn't work, you might want to check your index configuration and verify that the attribute you're filtering on is properly set up and indexed.
Sources
Choose specific documents from Kendra index for RAG | AWS re:Post
Query - Amazon Kendra API Reference
Filtering and facet search - Amazon Kendra

已回答 6 個月前
  • 'Contains' is not a supported parameter

0

The reason 'ContainsAny' filter is not working because it works only with 'StringList' type and '_source_uri' is reserved index filed of 'String' type. Therefore, you can try alternative approach of filtering by custom logic and a sample logic is shared below:

response = kendra.query(
            IndexId=INDEX_ID,
            QueryText=QUERY_TEXT,
            PageSize=100,  
            RequestedDocumentAttributes=['_source_uri']
        )

         # Filter client-138
        filtered_docs = [
            item for item in response['ResultItems']
            if any(
                attr['Key'] == '_source_uri' and
                '/client-138/' in attr['Value'].get('StringValue', '')
                for attr in item.get('DocumentAttributes', [])
            )
        ]
AWS
支援工程師
已回答 6 個月前
0

As you mentioned earlier, Filter Operation: Instead of using 'ContainsAny', you suggested using 'Contains' since we're trying to match a specific substring within the URI.

The "name" attribute is configured in the index with the following settings: Facetable: true, Searchable: true, Displayable: true, and Sortable: true. Here's the query I attempted:

Query parameters: { "IndexId": "a9826c71-b19f-48af-8420-402613d9a91d", "PageSize": 100, "QueryText": "*", "AttributeFilter": { "Contains": { "Key": "name", "Value": { "StringValue": "john" } } } }

However, this resulted in an error: Error querying Kendra (query): Parameter validation failed: Unknown parameter in AttributeFilter: "Contains", must be one of: AndAllFilters, OrAllFilters, NotFilter, EqualsTo, ContainsAll, ContainsAny, GreaterThan, GreaterThanOrEquals, LessThan, LessThanOrEquals

I've also checked the official documentation, and it confirms that "Contains" is not a valid filter operation in Kendra.

Could you please advise how I can achieve a partial string match (e.g., filter where "name", contains "john") given that the attribute is a StringValue?

已回答 4 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。