Knowledge Base for Bedrock Indexing

0

I had a doubt regarding knowledge base for bedrock, let's say I have 3 documents in s3 data source of knowledge base and I add two more, then I run start_ingestion_job using lambda, will it re-index the entire s3 data source to knowledge base or only embed the newly added documents.

It would be great if you could help me with this.

Thanks and Regards.

질문됨 2달 전242회 조회
2개 답변
2
수락된 답변

Hi,

You get this information when running aws bedrock-agent list-ingestion-jobs --data-source-id <value> . See https://docs.aws.amazon.com/cli/latest/reference/bedrock-agent/get-ingestion-job.html

"ingestionJobSummaries": [
        {
            "dataSourceId": "<ds-id>",
            "description": "ds sync after S3 refresh: 2024-05-30-10-45-17",
            "ingestionJobId": "7JLW4JTNVS",
            "knowledgeBaseId": ""<kb-id>",",
            "startedAt": "2024-05-30 08:45:18.918511+00:00",
            "statistics": {
                "numberOfDocumentsDeleted": 0,
                "numberOfDocumentsFailed": 1,
                "numberOfDocumentsScanned": 641,
                "numberOfMetadataDocumentsModified": 0,
                "numberOfMetadataDocumentsScanned": 0,
                "numberOfModifiedDocumentsIndexed": 0,
                "numberOfNewDocumentsIndexed": 640
            },
            "status": "COMPLETE",
            "updatedAt": "2024-05-30 08:51:30.050024+00:00"
        }
    ]

Explanations:

numberOfDocumentsDeleted -> (long) : The number of source documents that was deleted.
numberOfDocumentsFailed -> (long): The number of source documents that failed to be ingested.
numberOfDocumentsScanned -> (long) : The total number of source documents that were scanned. Includes new, updated, and unchanged documents.
numberOfMetadataDocumentsModified -> (long) : The number of metadata files that were updated or deleted.
numberOfMetadataDocumentsScanned -> (long) : The total number of metadata files that were scanned. Includes new, updated, and unchanged files.
numberOfModifiedDocumentsIndexed -> (long) : The number of modified source documents in the data source that were successfully indexed.
numberOfNewDocumentsIndexed -> (long): The number of new source documents in the data source that were successfully indexed.

So, the answer to your question is in numberOfModifiedDocumentsIndexed: Bedrock detects the modified documents via some form of checksum and re-index those only

Best,

Didier

Didier

profile pictureAWS
전문가
답변함 2달 전
profile picture
전문가
검토됨 한 달 전
profile picture
전문가
검토됨 2달 전
0

Thankyou Didier for your answer.

I was checking the response object but everytime I was getting 0 for all fields like documents scanned and newly indexed. But when searching the knowledge base it was reflecting the proper data source, so I was confused as to whether it is re-indexing all documents even if unchanged. But thanks for your explanation.

Thanks again and Regards. Nithin

답변함 2달 전
  • Hi Nithin, you're welcome! Thanks for accepting my answer

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠