Help us improve the AWS re:Post Knowledge Center by sharing your feedback in a brief survey. Your input can influence how we create and update our content to better support your AWS journey.
How do I automatically sync my data to Amazon Bedrock?
I want to automate my data synchronization for my Amazon Bedrock knowledge base.
Short description
Organizations that use a Retrieval Augmented Generation (RAG)-based approach for their AI applications must keep their knowledge base synchronized with their data. To automate data updates, you can use the StartIngestionJob API.
Prerequisites:
- An AWS Account with appropriate permissions.
- Familiarity with AWS SDK for your preferred programming language.
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshoot AWS CLI errors. Also, make sure that you're using the most recent AWS CLI version.
Resolution
Use the IngestionJob API
Complete the following steps:
-
Configure your AWS SDK for your preferred programming language. Or, if you use the AWS CLI, then configure the AWS CLI with your credentials.
-
To find your knowledge base ID, run the list-knowledge-bases AWS CLI command:
aws bedrock-agent list-knowledge-bases --region your-region-nameNote: Replace your-region-name with your AWS Region.
-
To find your data source ID, run the list-data-sources AWS CLI command:
aws bedrock-agent list-data-sources --knowledge-base-id your-knowledge-base-id --region your-region-nameNote: Replace your-region-name with your AWS Region, and your-knowledge-base-id with your knowledge base ID.
-
Run the StartIngestionJob API:
SDK_BedrockAgent_Client.StartIngestionJob( --knowledge-base-id your-knowledge-base-id --data-source-id your-data-source-id)Note: The API might look different depending on the program language that you use. Or, it might be different if you use the AWS CLI. The following is an example that uses Python and boto3:
import boto3from botocore.exceptions import ClientError def start_ingestion_job(knowledge_base_id, data_source_id): bedrock = boto3.client('bedrock-agent', region_name='your-region') try: response = bedrock.start_ingestion_job( knowledgeBaseId=knowledge_base_id, dataSourceId=data_source_id ) return response except ClientError as e: print(f"An error occurred: {e}") return None --# Usage knowledge_base_id = 'your-knowledge-base-id' data_source_id = 'your-data-source-id' job_response = start_ingestion_job(knowledge_base_id, data_source_id) if job_response: print(f"Ingestion job started successfully. Job ID: {job_response['ingestionJob']['ingestionJobId']}") else: print("Failed to start ingestion job.") -
From the output, note the ingestionJobId.
-
To check the status of the ingestion job, run the GetIngestionJob API:
SDK_BedrockAgent_Client.GetIngestionJob( --knowledge-base-id your-knowledge-base-id --data-source-id your-data-source-id --ingestion-job-id your-ingestion-job-id)Note: The API might look different depending on the program language that you use. Or, it might be different if you use the AWS CLI. The following is an example that uses Python and boto3:
def check_ingestion_job_status(knowledge_base_id, data_source_id, ingestion_job_id): bedrock = boto3.client('bedrock-agent', region_name='your-region') try: response = bedrock.get_ingestion_job( knowledgeBaseId=knowledge_base_id, dataSourceId = data_source_id, ingestionJobId=ingestion_job_id ) return response['ingestionJob']['status'] except ClientError as e: print(f"An error occurred: {e}") return None --# Usage if job_response: status = check_ingestion_job_status(knowledge_base_id, data_source_id, ingestion_job_id) print(f"Current ingestion job status: {status}")
Use pseudo-code to push your data to you knowledge base
Use pseudo-code to update data from all available data sources in your knowledge base.
Example:
Function StartJob(knowledgeBaseId, dataSourceId) Try job = BedrockAgentService.StartIngestionJob(knowledgeBaseId, dataSourceId) Return job Catch Error LogError("Failed to start ingestion job for data source: " + dataSourceId) Return null Function GetIngestionJobStatus(knowledgeBaseId, dataSourceId, ingestionJobId) Try jobStatus = BedrockAgentService.GetIngestionJob(knowledgeBaseId, dataSourceId, ingestionJobId) Return jobStatus Catch Error LogError("Failed to get status for job: " + ingestionJobId) Return null Function RunIngestionJobs(knowledgeBaseId) dataSources = BedrockAgentService.ListDataSources(knowledgeBaseId) For Each dataSource in dataSources job = StartJob(knowledgeBaseId, dataSource.Id) If job is not null Then LogInfo("Job started successfully for data source: " + dataSource.Id) While job.Status is not (Completed or Failed or Stopped) Wait for short interval job = GetIngestionJobStatus(knowledgeBaseId, dataSource.Id, job.Id) If job is null Then Break While loop If job is not null Then LogInfo("Job completed with status: " + job.Status) Else LogError("Job monitoring failed for data source: " + dataSource.Id) Else LogError("Failed to start job for data source: " + dataSource.Id) Main knowledgeBaseId = "<your-knowledge-base-id" RunIngestionJobs(knowledgeBaseId)
The code defines three main functions:
- The StartJob function uses the StartIngestionJob API to start an ingestion job for the knowledge base and data source that you provide.
- The GetIngestionJobStatus function gets the current status of the ingestion job that you provide.
- The RunIngestionJobs function starts and monitors the ingestion jobs for the data sources in the knowledge base that you provide.
Note: If an operation fails, review your error messages for more details.
Follow best practices
To reduce problems when you sync your data to Amazon Bedrock, complete the following best practices:
- To manage issues with API calls, implement error handling throughout the process.
- To periodically check a job's status, use a polling mechanism. For more information, see Poll for job status with Lambda and AWS Batch.
- Maintain detailed logs of the ingestion process for troubleshooting and auditing purposes.
- Follow AWS best practices for security.
- Review the data ingestion and storage costs.
- Tags
- Amazon Bedrock
- Language
- English
Related videos


Relevant content
- asked a year ago
- asked a year ago
- Accepted Answerasked a year ago
- asked a year ago
- asked a year ago