Skip to content

AWS Bedrock Knowledge Base Sync Fail: (Invalid parameter combination....Issue occurred while processing file: <s3 file>. Call to null did not succeed.

0

As the title suggest, I'm having issue syncing my knowledgebase. I have an arabic corpus that I want to sync and it has thousands of book. I took 2 books as a start to test syncing and this is where I encountered my problem. I put the corpus to s3.

My s3 folder looks like this where I created a file for each page for each book:

s3-bucket-folder
|__arabic_corpus
     |__book1
          |__volume-01---page-1.txt
          |__volume-01---page-1.txt.metadata.json
          |_ .....
     |__book2
          |__volume-01---page-1.txt
          |__volume-01---page-1.txt.metadata.json
          |_ ...

The total files are around 1500. However there are 100+ files that are failing that I don't get why it's failing. I've checked the CW logs and it doesn't seem to be that helpful. Here's an example error log from CW:

{"event_timestamp":1729478666096,"event":{"ingestion_job_id":"xx","document_location":{"type":"S3","s3_location":{"uri":"s3://xx/arabic_corpus/book1/volume-01---page-125.txt"}},"data_source_id":"xx","status_reasons":["Invalid parameter combination. Please check and try again. (Service: BedrockRuntime, Status Code: 400, Request ID: xx)"],"knowledge_base_arn":"arn:aws:bedrock:us-east-1:xx:knowledge-base/xx","status":"EMBEDDING_FAILED"},"event_version":"1.0","event_type":"StartIngestionJob.ResourceStatusChanged","level":"WARN"}

Here's its metadata

{
  "metadataAttributes": {
    "booktitle": "<arabic title>",
    ....
    "page_id": "volume-01,-page-125"
  }
}

More context: I'm using no chunking strategy. The sync worked for 1400+ files, but 100+ files failed with this issue and I couldn't figure why it failed though.

2 Answers
0

Hi,

I think that the key part of your ERROR message is "EMBEDDING FAILED". So, I would suggest that you try to call the embedding engine that you configured with the 100+ failing files to get more specific error message on the embedding problem in order to fix it.

Bedrock KB does not always surface in full the error messages of the subsystem that it calls (here the embedding LLM). So, if you replicate Bedrock's KB calls of this LLM on your own, yo shall obtain more precise error messages from the LLM itself rather than via KB.

You may also find my article useful to understand what KB does under the hood: https://repost.aws/articles/AR-LV1HoR_S0m-qy89wXwHmw/the-leverage-of-llm-system-prompt-by-knowledge-bases-for-bedrock-in-rag-workflows

Best,

Didier

EXPERT
answered a year ago
  • Thanks for the response.

    I'm using cohere.embed-multilingual-v3.

    And when I try to invoke manually for 1 of my failed files, it succeeded. How does under the hood the bedrock KB calls the model?

    This is a snippet of my code:

    request_body = {
        "texts": [content], # arabic content for a single page
        "input_type": "search_document",
    }
    
    response = bedrock.invoke_model(
        body=json.dumps(request_body),
        modelId="arn:aws:bedrock:us-east-1::foundation-model/cohere.embed-multilingual-v3",
        contentType="application/json",
        accept="*/*",
    )
    
0

The issue you're experiencing with syncing your AWS Bedrock Knowledge Base appears to be related to specific files in your Arabic corpus. While the majority of your files (1400+) synced successfully, around 100 files are failing. This could be due to several factors:

  1. File content: There might be something in the content of these specific files that the embedding process is unable to handle. This could be related to special characters, formatting, or encoding issues specific to Arabic text.

  2. Metadata issues: Although the metadata example you provided seems correctly formatted, there could be inconsistencies or errors in the metadata files for the failing documents.

  3. File size: Ensure that the failing files are not exceeding any size limits for ingestion.

  4. Encoding: Check if the failing files have a different encoding than the successfully synced ones.

  5. Special characters: Look for any unusual characters or formatting in the failing files that might be causing issues during the embedding process.

To troubleshoot this issue:

  1. Examine the content of the failing files to see if there are any commonalities or unusual elements.

  2. Double-check the metadata files for the failing documents to ensure they are correctly formatted and contain valid information.

  3. Try syncing a small batch of just the failing files to see if you can isolate the issue.

  4. If possible, try processing these files through a text normalization or cleaning step before ingestion.

  5. Consider using a chunking strategy, as this might help with processing larger or more complex documents.

If the issue persists after these steps, you may need to contact AWS Support for more detailed investigation, as the error message suggests an internal processing issue that might require their expertise to resolve.

Sources
Data Sync in bedrock knowledge base is failing | AWS re:Post
Bedrock knowledge base data source sync is not working | AWS re:Post
AWS Bedrock Knowledge Base not syncing when files are being over-written in S3 Bucket | AWS re:Post

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.