So we have a AWS Bedrock Knowledge Base. In that, when we are updating the data source (basically over-writing the files in S3 bucket), then the syncs are failing. Below are some of the logs: -
{
"event_timestamp": 1724220785008,
"event": {
"ingestion_job_id": "*****************",
"document_location": {
"type": "S3",
"s3_location": {
"uri": "s3://#############/###########/#########/some_file_name.json"
}
},
"chunk_statistics": {
"ignored": 0,
"metadata_updated": 0,
"failed_to_update_metadata": 78,
"deleted": 0,
"failed_to_delete": 0,
"created": 0,
"failed_to_create": 0
},
"data_source_id": "****************",
"knowledge_base_arn": "*******************************",
"status": "FAILED"
},
"event_version": "1.0",
"event_type": "StartIngestionJob.ResourceStatusChanged",
"level": "INFO"
}
Another sync example: -
{
"event_timestamp": 1724220785008,
"event": {
"ingestion_job_id": "#################",
"document_location": {
"type": "S3",
"s3_location": {
"uri": "s3://#############/#########/#####/some_random_filename.json"
}
},
"chunk_statistics": {
"ignored": 0,
"metadata_updated": 0,
"failed_to_update_metadata": 74,
"deleted": 36,
"failed_to_delete": 0,
"created": 35,
"failed_to_create": 0
},
"data_source_id": "#############",
"knowledge_base_arn": "##################",
"status": "PARTIALLY_INDEXED"
},
"event_version": "1.0",
"event_type": "StartIngestionJob.ResourceStatusChanged",
"level": "INFO"
}
We suspect that when the exact file is being over-written the number of chunks are different (in the pre and post situations).
However, if we delete it and upload it again, then it syncs perfectly. (the specific file that is the problem).
Also, Can anyone confirm what does "PARTIALLY-INDEXED" mean in this context?
Any advice would be appreciated.
We do use metadata files. And once the relevant json files are generated, their respective metadata files are also inserted/replaced in the S3 bucket acting as data source for the Knowledge base. And then the sync job is run.. where bedrock handles the chunking, embedding and chunk updates.