By using AWS re:Post, you agree to the AWS re:Post Terms of Use

AWS Bedrock Knowledge Base Incremental Sync Issues [S3 and Mongo DB Vector Store]

0

I have a process where I keep on inserting new documents or over-writing existing with updated ones in the s3 bucket. For updated documents, my sync job fails to update the metadata of these files for an unknown reason. I have looked into the metadata file and content file and there are not much significant changes either. Below is a snippet of error log from CloudWatch logs for the sync job. "chunk_statistics": { "ignored": 0, "metadata_updated": 0, "failed_to_update_metadata": 7, "deleted": 0, "failed_to_delete": 0, "created": 0, "failed_to_create": 0 }, "data_source_id": "", "knowledge_base_arn": "", "status": "FAILED"

Can anyone help on this?

1 Answer
0

@grazitti,

Greetings!! By any chance, can you provide additional information from CloudWatch logs (like anything related to the reason behind failure of sync-job)? Also, are you following this documentation to perform the sync-job?

AWS
answered 3 months ago
  • Hi @arjun, Thanks for your response.

    Yes we are following the documentation for performing the syncs i.e. content file size < 50 MB, metadata file size is < 10 KB and all the keys in metadata file have data types in [string, numbers and Booleans] only. For the cloud watch logs, unfortunately we just get this much information only for the failed syncs.

    { "event_timestamp": 1724553492795, "event": { "ingestion_job_id": "XXXX", "document_location": { "type": "S3", "s3_location": { "uri": "s3://XXXX.json" } }, "chunk_statistics": { "ignored": 0, "metadata_updated": 0, "failed_to_update_metadata": 180, "deleted": 0, "failed_to_delete": 0, "created": 0, "failed_to_create": 0 }, "data_source_id": "XXXX", "knowledge_base_arn": "XXXX", "status": "FAILED" }, "event_version": "1.0", "event_type": "StartIngestionJob.ResourceStatusChanged", "level": "INFO" }

    Not sure what's the exact issue here. Typically when I ingest fresh data, the sync process never fails. But suppose if I ingest the same files again into the source S3 bucket, the sync process fails for those files with above information.

    Let me know if you some clarity on this and maybe we can connect on some other channel for prompt communication. TIA.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions