Passer au contenu

AWS Kendra Retrieve API filter not working

0

Hello,

This is how I am calling the Kendra Retrieve API:

response = client.retrieve(
            IndexId=KENDRA_INDEX_ID,
            QueryText=query,
            PageSize=page_size,
            PageNumber=page_number,
            # AttributeFilter={
            #     'ContainsAny': {
            #         'Key': '_source_uri',
            #         'Value': {'StringListValue': ['client-138']}
            #     }
            # }
        )

As per code snapshot above, this work but I get results only without AttributeFilter. I tried multiple alternatives for using the AttributeFilter and it never returns any results. I am trying to filter only files which have 'client-138' in their URI, file path for example is "s3://production-bucket/client-138/xyz.pdf".

For key I have also tried 'DocumentURI', 'source_uri' etc. but no search results when I have AttributeFilter.

Thank you.

demandé il y a 6 mois103 vues
3 réponses
0

It seems you're encountering an issue with the AttributeFilter in your Kendra query. There are a few things to consider:

  1. API Difference: You're using the Retrieve API, but the AttributeFilter is typically used with the Query API. The Retrieve API has different parameters and functionality compared to the Query API.

  2. Attribute Key: The correct attribute key for the document URI is typically '_source_uri' or 'SourceURI', so your approach there is correct.

  3. Filter Operation: Instead of 'ContainsAny', you might want to use 'Contains' since you're looking for a specific string within the URI.

  4. Value Type: For a single string value, use 'StringValue' instead of 'StringListValue'.

Here's a suggested modification to your code, using the Query API instead of Retrieve:

response = client.query(
    IndexId=KENDRA_INDEX_ID,
    QueryText=query,
    PageSize=page_size,
    PageNumber=page_number,
    AttributeFilter={
        'Contains': {
            'Key': '_source_uri',
            'Value': {'StringValue': 'client-138'}
        }
    }
)

If you specifically need to use the Retrieve API, note that it might not support the same filtering capabilities as the Query API. In that case, you may need to perform the filtering on the client side after retrieving the results.

Also, ensure that the documents in your index have the correct metadata for the '_source_uri' field and that it contains the value you're filtering for. If the filtering still doesn't work, you might want to check your index configuration and verify that the attribute you're filtering on is properly set up and indexed.
Sources
Choose specific documents from Kendra index for RAG | AWS re:Post
Query - Amazon Kendra API Reference
Filtering and facet search - Amazon Kendra

répondu il y a 6 mois
  • 'Contains' is not a supported parameter

0

The reason 'ContainsAny' filter is not working because it works only with 'StringList' type and '_source_uri' is reserved index filed of 'String' type. Therefore, you can try alternative approach of filtering by custom logic and a sample logic is shared below:

response = kendra.query(
            IndexId=INDEX_ID,
            QueryText=QUERY_TEXT,
            PageSize=100,  
            RequestedDocumentAttributes=['_source_uri']
        )

         # Filter client-138
        filtered_docs = [
            item for item in response['ResultItems']
            if any(
                attr['Key'] == '_source_uri' and
                '/client-138/' in attr['Value'].get('StringValue', '')
                for attr in item.get('DocumentAttributes', [])
            )
        ]
AWS
INGÉNIEUR EN ASSISTANCE TECHNIQUE
répondu il y a 5 mois
0

As you mentioned earlier, Filter Operation: Instead of using 'ContainsAny', you suggested using 'Contains' since we're trying to match a specific substring within the URI.

The "name" attribute is configured in the index with the following settings: Facetable: true, Searchable: true, Displayable: true, and Sortable: true. Here's the query I attempted:

Query parameters: { "IndexId": "a9826c71-b19f-48af-8420-402613d9a91d", "PageSize": 100, "QueryText": "*", "AttributeFilter": { "Contains": { "Key": "name", "Value": { "StringValue": "john" } } } }

However, this resulted in an error: Error querying Kendra (query): Parameter validation failed: Unknown parameter in AttributeFilter: "Contains", must be one of: AndAllFilters, OrAllFilters, NotFilter, EqualsTo, ContainsAll, ContainsAny, GreaterThan, GreaterThanOrEquals, LessThan, LessThanOrEquals

I've also checked the official documentation, and it confirms that "Contains" is not a valid filter operation in Kendra.

Could you please advise how I can achieve a partial string match (e.g., filter where "name", contains "john") given that the attribute is a StringValue?

répondu il y a 4 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.