How to set the starting position for a Kinesis Delivery Stream

0

When hooking up a lambda to a kinesis stream, the event source can choose a starting position. Settings like TRIM_HORIZON and timestamps allow to replay data that is already in the stream. When using a delivery stream this setting does not seem available, and it only plays data from the latest record. Even creating a brand new firehose does not allow to replay the data. This is a problem when the data is difficult or expensive to generate again. For example I had the firehose not set up properly a couple times, but had to restart the whole expensive backend job to process the data again. Am I missing something?

edgy42
已提问 2 年前1617 查看次数
1 回答
1

Yes you are right. If Kinesis Data Streams is used as a source for Kinesis Data Firehose, then KDF starts reading from the latest position from KDS. This is documented here - https://docs.aws.amazon.com/firehose/latest/dev/writing-with-kinesis-streams.html - "Kinesis Data Firehose starts reading data from the LATEST position of your Kinesis stream."

It is also mentioned in the FAQ - https://aws.amazon.com/kinesis/data-firehose/faqs/

Q: From where does Kinesis Data Firehose read data when my Kinesis Data Stream is configured as the source of my delivery stream?

Kinesis Data Firehose starts reading data from the LATEST position of your Kinesis Data Stream when it’s configured as the source of a delivery stream.

If you want to read records in the KDS from the beginning or from a particular position, one option could be to configure a lambda function to read from the position you are interested in and then put the records to KDF using the SDK.

profile pictureAWS
专家
已回答 2 年前
  • @edgy42 - If my response has helped you, can I request you to please accept my answer. Thanks

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则