Multiple Kinesis Firehose Destinations

1

A customer wants to use Kinesis for gathering and aggregating log data from multiple accounts into a central account. They have two destinations - 1. S3 (using parquet transformation for easy Glue/Athena access), 2. ElasticSearch (for visualization and common queries). The question is what is the AWS recommended approach for this and why?

These are the three approaches that come to mind, however looking for more guidance:

A. Using Kinesis Data Stream as the primary delivery mechanism, and use that to feed two Kinesis Firehose streams - each one targeted at one of the destinations (see above).

B. Using Kinesis Firehose as the primary delivery mechanism, targeting initial delivery into S3 (with parquet conversion), and using an S3/Lambda trigger to load the data into ElasticSearch.

C. Using Kinesis Firehose as the primary delivery mechanism, targeting both S3 (raw - not converted) delivery and ElasticSearch delivery. Then using an S3/Lambda trigger to transform the raw S3 data into parquet format saved back into S3.

AWS
質問済み 4年前3582ビュー
1回答
1
承認された回答

I have spoken with several others about this same question and the answer really boiled down to this:

  1. There are many ways to move data around AWS, and many of them can be the 'right' way depending on several factors such as velocity, volume, data sources, data consumption patterns and tools, and more. In short - there is no blanket 'right' answer, it will depend on the specific context.

  2. The initial proposed approach (Using Kinesis Data Stream as the primary delivery mechanism, and use that to feed two Kinesis Firehose streams - each one targeted at one of the destinations required) is an acceptable approach and pattern. However the question that should be answered is: does the customer want to create a 'raw data' bucket of these logs, or is the landed data (in either S3/parquet or Elasticsearch) the acceptable source of truth.

  3. The other patterns mentioned here by others are also acceptable patterns, however each should be reviewed for trade-offs and impacts to ensure that the solution matches the customer requirements and context (i.e. velocity, volume, data sources, data consumption patterns and tools, and more).

AWS
回答済み 4年前
profile picture
エキスパート
レビュー済み 1時間前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ