Kendra Index keep on Sync-Index

0

I have created a Kendra Index. Data source is set to s3 bucket which has few PDF files. Total size of data will be about 20 MBs or less.

Once I start Sync job it crawls the source and then started indexing. In Sync run history, Total documents added are same as total documents scanned.

I can search indexed documents and results are fine.

But after 2 hours it is still in Syncing - indexing state. Why does not it stops?

Will I be billed of this? and why does it not stops yet after 2 hours?

cloud watch says: connector has successfully completed syncing document

Please note that IAM permissions are all set and they are not causing problem. I have tried with administrator access too.

3 Answers
0

Synchronization times in Kendra can vary based on a number of factors, such as:

  • Number and size of documents: Syncing a large number of big documents will take longer than a small number of short documents.
  • Initial sync vs incremental sync: The initial full sync of an entire S3 bucket into Kendra will take longer than subsequent incremental syncs of updates/additions.
  • Load on Kendra: If there are many indexes syncing data simultaneously, it may take more time compared to low usage periods.
  • Index size: Bigger indexes with more data, users and query capacity take longer to update than smaller indexes.

As a rough estimate, it can take anywhere from minutes to hours for Amazon Kendra to fully sync contents from an S3 bucket depending on the above factors. I recommend checking the status in the AWS console to see if a sync has completed (see methods below).

For both the Developer and Enterprise tiers of Kendra, there is a $0.35 per hour connector charge when synchronization is being performed, and if this is marked as "COMPLETE" via either of the methods below, you will not be charged for connector usage.

To monitor the sync status of your Kendra index, you can use the Amazon Kendra console or AWS CLI.

Using the console:

  1. Sign in to the AWS Management Console and open the Amazon Kendra console (link).
  2. From the list of indexes, select the index you want to monitor.
  3. Choose the "Data sources" option from the left menu.
  4. Select the data source and scroll down to view the sync run history and metrics.

This will show you details of previous syncs like start/end time, number of documents added/deleted/failed. You can also view the total document count indexed from that source.

Using the AWS CLI:

aws kendra list-data-source-sync-jobs --id <data-source-id> --index-id <index-id> --region <region>

Where:

  • <data-source-id> is the ID of the data source
  • <index-id> is the ID of the Kendra index
  • <region> is the AWS region

This will list the status of ongoing or recent sync jobs for the given data source.

Sources

[1] [Monitoring your index (console) - Amazon Kendra] (https://docs.aws.amazon.com/kendra/latest/dg/monitoring-runsync.html)

[2] [Creating an Amazon Kendra index and ingesting the metadata - Amazon Kendra] (https://docs.aws.amazon.com/kendra/latest/dg/tutorial-search-metadata-create-index-ingest.html)

Please accept answer if helpful!

profile pictureAWS
answered 3 months ago
  • can it be an issue with the sync job? as the data is no more than 20 MB, There are no parallel sync jobs, kendra do not have any prior data

0

I have the same issue! Indexing seems to be stuck.

Luuk6
answered 3 months ago
0

Kendra scan of 20 text files, each 1kB, took 8 mins in my case. I scanned the same source a few times and the scan time is consistent.

Daniel
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions