- Newest
- Most votes
- Most comments
The error you're encountering with your Amazon Kendra S3 connector is related to the format of your metadata JSON file. The error message indicates that your metadata file is not matching the required schema format that Kendra expects.
Here are some potential issues and solutions:
-
Metadata File Naming: Make sure your metadata file follows the correct naming convention. The metadata file should have the same name as your document with ".metadata.json" appended to it. For example, if your document is "mydocument.pdf", the metadata file should be "mydocument.pdf.metadata.json".
-
JSON Structure: The structure of your JSON appears to have some issues. For Amazon Kendra S3 data sources, the metadata JSON should follow a specific format. The "_source_uri" field should be at the top level of your JSON, not nested within the "Attributes" object.
-
Field Formatting: Some of your fields might not be in the expected format. For example, date fields should be in the ISO 8601 format.
Here's a suggested correction to your metadata JSON format:
{
"Title": "Document title",
"ContentType": "pdf",
"Id": "100000241",
"_source_uri": "Your source URI here",
"asset_id": "100000241",
"asset_title": "Some title here",
"description": "Some description",
"asset_type": "Brochure:",
"audience_level_1": "Client",
"mru_control_id": "1004862-00009-00",
"persistent_url": "Some URL",
"date_published": "1970-01-01T00:00:00.000Z",
"expiration_date": "2026-05-06T17:10:00.000Z",
"modified_date_time": "2024-06-06T18:04:14.722Z",
"thumbnail": "Some thumb URL",
"file_format": "pdf",
"asset_subtype": "Brochure:",
"created_date": "2024-03-07T18:12:49.559Z",
"upload_date_time": "2024-03-07T18:12:49.559Z",
"keywords": ["keyword1", "keyword2", "keyword3"],
"type": "assets"
}
Also, ensure that all the custom fields you're using in the metadata file have been properly defined in your Kendra index before attempting to sync. If you've created custom fields in your Kendra index, make sure the field names in your metadata JSON match exactly with the field names in your index.
Additionally, check that your IAM role for Kendra has the necessary permissions to access the S3 bucket and its objects. The role should have permissions for actions like s3:GetObject and s3:ListBucket on your S3 bucket.
Sources
Encountered error "Ignored 1 files as metadata file is not in valid JSON format. [Files:docs_metadata.csv]. Call to Customer Source did not succeed." | AWS re:Post
Troubleshooting data sources - Amazon Kendra
"Kendra FAQ Failed to Stabilize" Error when attempting to create a Kendra FAQ in CFN | AWS re:Post
BMW Group Develops a GenAI Assistant to Accelerate Infrastructure Optimization on AWS | AWS for Industries
thanks a lot for answering..about your points:
- yup, I made sure about the names for PDF and metadata json
- sorry, had to mask some values before posting here, but I´ll remove dates just for testing
- probably that´s my issue..didn´t know that attributes order was an issue..gonna follow the order from AWS docs.
UPDATE: changed the JSON format to have _source_uri at top of list, but I got the same errors... also, to create the metadata JSON file, I followed this format: https://docs.aws.amazon.com/kendra/latest/dg/s3-metadata.html with my fields inside Attributes object.
Relevant content
- asked 2 years ago
- asked 6 years ago
