Knowledge Base for Bedrock Indexing

0

I had a doubt regarding knowledge base for bedrock, let's say I have 3 documents in s3 data source of knowledge base and I add two more, then I run start_ingestion_job using lambda, will it re-index the entire s3 data source to knowledge base or only embed the newly added documents.

It would be great if you could help me with this.

Thanks and Regards.

asked 14 days ago158 views
2 Answers
2
Accepted Answer

Hi,

You get this information when running aws bedrock-agent list-ingestion-jobs --data-source-id <value> . See https://docs.aws.amazon.com/cli/latest/reference/bedrock-agent/get-ingestion-job.html

"ingestionJobSummaries": [
        {
            "dataSourceId": "<ds-id>",
            "description": "ds sync after S3 refresh: 2024-05-30-10-45-17",
            "ingestionJobId": "7JLW4JTNVS",
            "knowledgeBaseId": ""<kb-id>",",
            "startedAt": "2024-05-30 08:45:18.918511+00:00",
            "statistics": {
                "numberOfDocumentsDeleted": 0,
                "numberOfDocumentsFailed": 1,
                "numberOfDocumentsScanned": 641,
                "numberOfMetadataDocumentsModified": 0,
                "numberOfMetadataDocumentsScanned": 0,
                "numberOfModifiedDocumentsIndexed": 0,
                "numberOfNewDocumentsIndexed": 640
            },
            "status": "COMPLETE",
            "updatedAt": "2024-05-30 08:51:30.050024+00:00"
        }
    ]

Explanations:

numberOfDocumentsDeleted -> (long) : The number of source documents that was deleted.
numberOfDocumentsFailed -> (long): The number of source documents that failed to be ingested.
numberOfDocumentsScanned -> (long) : The total number of source documents that were scanned. Includes new, updated, and unchanged documents.
numberOfMetadataDocumentsModified -> (long) : The number of metadata files that were updated or deleted.
numberOfMetadataDocumentsScanned -> (long) : The total number of metadata files that were scanned. Includes new, updated, and unchanged files.
numberOfModifiedDocumentsIndexed -> (long) : The number of modified source documents in the data source that were successfully indexed.
numberOfNewDocumentsIndexed -> (long): The number of new source documents in the data source that were successfully indexed.

So, the answer to your question is in numberOfModifiedDocumentsIndexed: Bedrock detects the modified documents via some form of checksum and re-index those only

Best,

Didier

Didier

profile pictureAWS
EXPERT
answered 14 days ago
profile picture
EXPERT
reviewed 3 days ago
profile picture
EXPERT
reviewed 14 days ago
0

Thankyou Didier for your answer.

I was checking the response object but everytime I was getting 0 for all fields like documents scanned and newly indexed. But when searching the knowledge base it was reflecting the proper data source, so I was confused as to whether it is re-indexing all documents even if unchanged. But thanks for your explanation.

Thanks again and Regards. Nithin

answered 14 days ago
  • Hi Nithin, you're welcome! Thanks for accepting my answer

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions