Skip to content

AWS Bedrock Knowledge Base not syncing when files are being over-written in S3 Bucket

0

So we have a AWS Bedrock Knowledge Base. In that, when we are updating the data source (basically over-writing the files in S3 bucket), then the syncs are failing. Below are some of the logs: -

{
    "event_timestamp": 1724220785008,
    "event": {
        "ingestion_job_id": "*****************",
        "document_location": {
            "type": "S3",
            "s3_location": {
                "uri": "s3://#############/###########/#########/some_file_name.json"
            }
        },
        "chunk_statistics": {
            "ignored": 0,
            "metadata_updated": 0,
            "failed_to_update_metadata": 78,
            "deleted": 0,
            "failed_to_delete": 0,
            "created": 0,
            "failed_to_create": 0
        },
        "data_source_id": "****************",
        "knowledge_base_arn": "*******************************",
        "status": "FAILED"
    },
    "event_version": "1.0",
    "event_type": "StartIngestionJob.ResourceStatusChanged",
    "level": "INFO"
}

Another sync example: -

{
    "event_timestamp": 1724220785008,
    "event": {
        "ingestion_job_id": "#################",
        "document_location": {
            "type": "S3",
            "s3_location": {
                "uri": "s3://#############/#########/#####/some_random_filename.json"
            }
        },
        "chunk_statistics": {
            "ignored": 0,
            "metadata_updated": 0,
            "failed_to_update_metadata": 74,
            "deleted": 36,
            "failed_to_delete": 0,
            "created": 35,
            "failed_to_create": 0
        },
        "data_source_id": "#############",
        "knowledge_base_arn": "##################",
        "status": "PARTIALLY_INDEXED"
    },
    "event_version": "1.0",
    "event_type": "StartIngestionJob.ResourceStatusChanged",
    "level": "INFO"
}

We suspect that when the exact file is being over-written the number of chunks are different (in the pre and post situations).

However, if we delete it and upload it again, then it syncs perfectly. (the specific file that is the problem).

Also, Can anyone confirm what does "PARTIALLY-INDEXED" mean in this context?

Any advice would be appreciated.

1 Answer
0

Hi,

Do you attach Bedrock Metadata to your files ?

It seems from the error messages that they are the source of error. Are they properly updated when you update the knowledge files in S3?

Best,

Didier

EXPERT
answered a year ago
  • We do use metadata files. And once the relevant json files are generated, their respective metadata files are also inserted/replaced in the S3 bucket acting as data source for the Knowledge base. And then the sync job is run.. where bedrock handles the chunking, embedding and chunk updates.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.