Skip to content

Bedrock Knowledge Base can't ingest and vectorize converted cpp code files

0

We're looking to hook up a Bedrock Knowledge Base to an S3 Bucket with files from a codebase that include .h and .cpp files. Because Knowledge Bases have a list of supported file types, we first renamed these files from filename.cpp -> filename_cpp.txt

However, when setting this S3 Bucket of files as a Data Source and Syncing, we always get Warnings that every one of the cpp files failed with "Encountered error: Ignored x files as their file format was not supported. "

We already renamed to .txt and S3 is in fact recognizing the files as Type=txt. Wehn downloading the renamed cpp files, they are in fact readable as flat text files with notepad, so have no idea why only the renamed cpp file types and not the rename h file types are failing to vectorize in the Knowledge Base.

SOLUTION The issue was in the #include headers in each _cpp.txt file. Once we cleaned out all rows starting with #include, the Knowledge Base successfully processed all files into the vector db Collection on OpenSearch Serverless.

3 Answers
0
Accepted Answer

The issue was in the #include headers in each _cpp.txt file. Once we cleaned out all rows starting with #include, the Knowledge Base successfully processed all files into the vector db Collection on OpenSearch Serverless.

answered a year ago
0

Hi,

This is not really an answer to your question but I can confirm that Knowledge Bases accept source code from other programming languages.

For example, I made successful trials with Python code. Here my article with Python source code (django on Github): https://repost.aws/articles/AR-LV1HoR_S0m-qy89wXwHmw/the-leverage-of-llm-system-prompt-by-knowledge-bases-for-bedrock-in-rag-workflows

Can you also try with some other source code : Java, Python to see if it is really specific to C++ ?

Best.

Didier

EXPERT
answered a year ago
  • renamed python files - filename.py -> filename.py.txt - seem to process as expected. As well as non-renamed .py and .h files - they both process successfully. What is it about C++ .cpp files; they're also just plain text files like the other files.

0

Hi, this may also not directly the answer to your question, but I wanted to make sure you had looked at Amazon Q Developer as an alternative to Bedrock Knowledge Bases as this service optimized to help with coding development:

https://aws.amazon.com/q/developer/C

C++ is one of the supported languages: https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/q-language-ide-support.html

AWS
EXPERT
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.