Knowledge Base for Bedrock Indexing

0

I had a doubt regarding knowledge base for bedrock, let's say I have 3 documents in s3 data source of knowledge base and I add two more, then I run start_ingestion_job using lambda, will it re-index the entire s3 data source to knowledge base or only embed the newly added documents.

It would be great if you could help me with this.

Thanks and Regards.

2 Answers
2
Accepted Answer

Hi,

You get this information when running aws bedrock-agent list-ingestion-jobs --data-source-id <value> . See https://docs.aws.amazon.com/cli/latest/reference/bedrock-agent/get-ingestion-job.html

"ingestionJobSummaries": [
        {
            "dataSourceId": "<ds-id>",
            "description": "ds sync after S3 refresh: 2024-05-30-10-45-17",
            "ingestionJobId": "7JLW4JTNVS",
            "knowledgeBaseId": ""<kb-id>",",
            "startedAt": "2024-05-30 08:45:18.918511+00:00",
            "statistics": {
                "numberOfDocumentsDeleted": 0,
                "numberOfDocumentsFailed": 1,
                "numberOfDocumentsScanned": 641,
                "numberOfMetadataDocumentsModified": 0,
                "numberOfMetadataDocumentsScanned": 0,
                "numberOfModifiedDocumentsIndexed": 0,
                "numberOfNewDocumentsIndexed": 640
            },
            "status": "COMPLETE",
            "updatedAt": "2024-05-30 08:51:30.050024+00:00"
        }
    ]

Explanations:

numberOfDocumentsDeleted -> (long) : The number of source documents that was deleted.
numberOfDocumentsFailed -> (long): The number of source documents that failed to be ingested.
numberOfDocumentsScanned -> (long) : The total number of source documents that were scanned. Includes new, updated, and unchanged documents.
numberOfMetadataDocumentsModified -> (long) : The number of metadata files that were updated or deleted.
numberOfMetadataDocumentsScanned -> (long) : The total number of metadata files that were scanned. Includes new, updated, and unchanged files.
numberOfModifiedDocumentsIndexed -> (long) : The number of modified source documents in the data source that were successfully indexed.
numberOfNewDocumentsIndexed -> (long): The number of new source documents in the data source that were successfully indexed.

So, the answer to your question is in numberOfModifiedDocumentsIndexed: Bedrock detects the modified documents via some form of checksum and re-index those only

Best,

Didier

Didier

profile pictureAWS
EXPERT
answered a year ago
profile picture
EXPERT
reviewed a year ago
profile picture
EXPERT
reviewed a year ago
0

Thankyou Didier for your answer.

I was checking the response object but everytime I was getting 0 for all fields like documents scanned and newly indexed. But when searching the knowledge base it was reflecting the proper data source, so I was confused as to whether it is re-indexing all documents even if unchanged. But thanks for your explanation.

Thanks again and Regards. Nithin

answered a year ago
  • Hi Nithin, you're welcome! Thanks for accepting my answer

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions