- Newest
- Most votes
- Most comments
Hi,
I think that the key part of your ERROR message is "EMBEDDING FAILED". So, I would suggest that you try to call the embedding engine that you configured with the 100+ failing files to get more specific error message on the embedding problem in order to fix it.
Bedrock KB does not always surface in full the error messages of the subsystem that it calls (here the embedding LLM). So, if you replicate Bedrock's KB calls of this LLM on your own, yo shall obtain more precise error messages from the LLM itself rather than via KB.
You may also find my article useful to understand what KB does under the hood: https://repost.aws/articles/AR-LV1HoR_S0m-qy89wXwHmw/the-leverage-of-llm-system-prompt-by-knowledge-bases-for-bedrock-in-rag-workflows
Best,
Didier
The issue you're experiencing with syncing your AWS Bedrock Knowledge Base appears to be related to specific files in your Arabic corpus. While the majority of your files (1400+) synced successfully, around 100 files are failing. This could be due to several factors:
-
File content: There might be something in the content of these specific files that the embedding process is unable to handle. This could be related to special characters, formatting, or encoding issues specific to Arabic text.
-
Metadata issues: Although the metadata example you provided seems correctly formatted, there could be inconsistencies or errors in the metadata files for the failing documents.
-
File size: Ensure that the failing files are not exceeding any size limits for ingestion.
-
Encoding: Check if the failing files have a different encoding than the successfully synced ones.
-
Special characters: Look for any unusual characters or formatting in the failing files that might be causing issues during the embedding process.
To troubleshoot this issue:
-
Examine the content of the failing files to see if there are any commonalities or unusual elements.
-
Double-check the metadata files for the failing documents to ensure they are correctly formatted and contain valid information.
-
Try syncing a small batch of just the failing files to see if you can isolate the issue.
-
If possible, try processing these files through a text normalization or cleaning step before ingestion.
-
Consider using a chunking strategy, as this might help with processing larger or more complex documents.
If the issue persists after these steps, you may need to contact AWS Support for more detailed investigation, as the error message suggests an internal processing issue that might require their expertise to resolve.
Sources
Data Sync in bedrock knowledge base is failing | AWS re:Post
Bedrock knowledge base data source sync is not working | AWS re:Post
AWS Bedrock Knowledge Base not syncing when files are being over-written in S3 Bucket | AWS re:Post
Relevant content
- asked a year ago
- AWS OFFICIALUpdated a year ago

Thanks for the response.
I'm using cohere.embed-multilingual-v3.
And when I try to invoke manually for 1 of my failed files, it succeeded. How does under the hood the bedrock KB calls the model?
This is a snippet of my code: