By using AWS re:Post, you agree to the Terms of Use

Picking the correct Opensearch index date from the Kinesis Delivery Stream


When using Kinesis firehose to opensearch / elasticsearch, while the delivery stream is super convenient, one major limitation I find is that one cannot override the timestamp field that is use to decide on the destination index (it always uses the estimated arrival time). This means that for backfill jobs (which are super common for us) all the data ends up in the current daily index, which reduces read/write scalability and also makes index-based management much more difficult (e.g. compact / archive index older than N days). Another example is when querying dashboards, some indices have data for a large range of dates and end up becoming a bottleneck for performance.

Ideally one could pick a timestamp field from the events, or set it as part of the lambda processing, so that a record with date D goes to index of date D. Thanks for suggestions!