- Newest
- Most votes
- Most comments
Yes, there is a way to reduce the need for individual metadata.json files when ingesting documents into Amazon Bedrock Knowledge Base. Instead of creating separate metadata files for each document, you can use a CSV (Comma-Separated Values) file to store metadata collectively for multiple documents.
Here's how you can implement this approach:
-
Create a single CSV file that contains metadata for all your documents. This file should have a header row with column names representing different metadata fields.
-
Upload this CSV file to your S3 bucket along with your documents.
-
Create a single metadata.json file for the CSV file. This metadata.json file will provide instructions on how to treat the columns in your CSV file.
-
In the metadata.json file, you'll need to specify:
- The metadataAttributes section, which defines the attributes and their types.
- The documentStructureConfiguration section, which tells the knowledge base how to interpret the CSV file.
- In the documentStructureConfiguration, you'll specify:
- The content fields (columns that contain the actual document content)
- The metadata fields (columns that contain metadata about the documents)
By using this method, you can have a single CSV file and a corresponding metadata.json file, instead of potentially having hundreds or thousands of individual metadata files.
Remember to ensure that:
- Your CSV file is in RFC4180 format and UTF-8 encoded.
- The first row of your CSV includes header information.
- Metadata fields provided in your metadata.json are present as columns in your CSV.
- The CSV file and its metadata.json file are properly named and located in your S3 bucket.
This approach can significantly simplify your metadata management, especially in a production setup with many documents from a GitHub repository. It allows you to centralize your metadata in one file, making it easier to maintain and update as needed.
Sources
Include metadata in a data source to improve knowledge base query - Amazon Bedrock

Given the S3 structure:
├── docs-A/ │ ├── mydata.md ├── docs-B/ │ ├── another_data.docx Where is the appropriate location for the CSV file? What CSV file and metadata look like?