I want to view the list of URLs that the Web Crawler adds to the data source in my Amazon Bedrock knowledge base.
Short description
To view the URLs that the Web Crawler adds to the data source, you must add a log delivery option to your knowledge base to record the logs of ingestion jobs. The ingestion logs record URLs that the Web Crawler adds to the data source. After you complete an ingestion job, use Amazon CloudWatch Logs Insights to analyze the log data.
Resolution
Complete the following steps:
- Update your knowledge base to add CloudWatch Logs as the delivery option.
- Sync your Web Crawler data source.
- In the Sync history section, select the target data source, and then note the ingestion job ID.
- Open the CloudWatch console, and then select a log group.
- Set the time range for when the sync job ran.
- Choose View in Logs Insights, and then enter the following query:
fields event.document_location.web_location.url | filter @message like 'IngestionJobId' and event.status like 'INDEXED'
Note: Replace IngestionJobId with your ingestion job ID.
- Choose Run query to view the list of URLs.