내용으로 건너뛰기

AWS Kendra Retrieve API filter not working

0

Hello,

This is how I am calling the Kendra Retrieve API:

response = client.retrieve(
            IndexId=KENDRA_INDEX_ID,
            QueryText=query,
            PageSize=page_size,
            PageNumber=page_number,
            # AttributeFilter={
            #     'ContainsAny': {
            #         'Key': '_source_uri',
            #         'Value': {'StringListValue': ['client-138']}
            #     }
            # }
        )

As per code snapshot above, this work but I get results only without AttributeFilter. I tried multiple alternatives for using the AttributeFilter and it never returns any results. I am trying to filter only files which have 'client-138' in their URI, file path for example is "s3://production-bucket/client-138/xyz.pdf".

For key I have also tried 'DocumentURI', 'source_uri' etc. but no search results when I have AttributeFilter.

Thank you.

질문됨 6달 전113회 조회
3개 답변
0

It seems you're encountering an issue with the AttributeFilter in your Kendra query. There are a few things to consider:

  1. API Difference: You're using the Retrieve API, but the AttributeFilter is typically used with the Query API. The Retrieve API has different parameters and functionality compared to the Query API.

  2. Attribute Key: The correct attribute key for the document URI is typically '_source_uri' or 'SourceURI', so your approach there is correct.

  3. Filter Operation: Instead of 'ContainsAny', you might want to use 'Contains' since you're looking for a specific string within the URI.

  4. Value Type: For a single string value, use 'StringValue' instead of 'StringListValue'.

Here's a suggested modification to your code, using the Query API instead of Retrieve:

response = client.query(
    IndexId=KENDRA_INDEX_ID,
    QueryText=query,
    PageSize=page_size,
    PageNumber=page_number,
    AttributeFilter={
        'Contains': {
            'Key': '_source_uri',
            'Value': {'StringValue': 'client-138'}
        }
    }
)

If you specifically need to use the Retrieve API, note that it might not support the same filtering capabilities as the Query API. In that case, you may need to perform the filtering on the client side after retrieving the results.

Also, ensure that the documents in your index have the correct metadata for the '_source_uri' field and that it contains the value you're filtering for. If the filtering still doesn't work, you might want to check your index configuration and verify that the attribute you're filtering on is properly set up and indexed.
Sources
Choose specific documents from Kendra index for RAG | AWS re:Post
Query - Amazon Kendra API Reference
Filtering and facet search - Amazon Kendra

답변함 6달 전
  • 'Contains' is not a supported parameter

0

The reason 'ContainsAny' filter is not working because it works only with 'StringList' type and '_source_uri' is reserved index filed of 'String' type. Therefore, you can try alternative approach of filtering by custom logic and a sample logic is shared below:

response = kendra.query(
            IndexId=INDEX_ID,
            QueryText=QUERY_TEXT,
            PageSize=100,  
            RequestedDocumentAttributes=['_source_uri']
        )

         # Filter client-138
        filtered_docs = [
            item for item in response['ResultItems']
            if any(
                attr['Key'] == '_source_uri' and
                '/client-138/' in attr['Value'].get('StringValue', '')
                for attr in item.get('DocumentAttributes', [])
            )
        ]
AWS
지원 엔지니어
답변함 6달 전
0

As you mentioned earlier, Filter Operation: Instead of using 'ContainsAny', you suggested using 'Contains' since we're trying to match a specific substring within the URI.

The "name" attribute is configured in the index with the following settings: Facetable: true, Searchable: true, Displayable: true, and Sortable: true. Here's the query I attempted:

Query parameters: { "IndexId": "a9826c71-b19f-48af-8420-402613d9a91d", "PageSize": 100, "QueryText": "*", "AttributeFilter": { "Contains": { "Key": "name", "Value": { "StringValue": "john" } } } }

However, this resulted in an error: Error querying Kendra (query): Parameter validation failed: Unknown parameter in AttributeFilter: "Contains", must be one of: AndAllFilters, OrAllFilters, NotFilter, EqualsTo, ContainsAll, ContainsAny, GreaterThan, GreaterThanOrEquals, LessThan, LessThanOrEquals

I've also checked the official documentation, and it confirms that "Contains" is not a valid filter operation in Kendra.

Could you please advise how I can achieve a partial string match (e.g., filter where "name", contains "john") given that the attribute is a StringValue?

답변함 4달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.