Multiple Kinesis Firehose Destinations

1

A customer wants to use Kinesis for gathering and aggregating log data from multiple accounts into a central account. They have two destinations - 1. S3 (using parquet transformation for easy Glue/Athena access), 2. ElasticSearch (for visualization and common queries). The question is what is the AWS recommended approach for this and why?

These are the three approaches that come to mind, however looking for more guidance:

A. Using Kinesis Data Stream as the primary delivery mechanism, and use that to feed two Kinesis Firehose streams - each one targeted at one of the destinations (see above).

B. Using Kinesis Firehose as the primary delivery mechanism, targeting initial delivery into S3 (with parquet conversion), and using an S3/Lambda trigger to load the data into ElasticSearch.

C. Using Kinesis Firehose as the primary delivery mechanism, targeting both S3 (raw - not converted) delivery and ElasticSearch delivery. Then using an S3/Lambda trigger to transform the raw S3 data into parquet format saved back into S3.

AWS
已提問 4 年前檢視次數 3590 次
1 個回答
1
已接受的答案

I have spoken with several others about this same question and the answer really boiled down to this:

  1. There are many ways to move data around AWS, and many of them can be the 'right' way depending on several factors such as velocity, volume, data sources, data consumption patterns and tools, and more. In short - there is no blanket 'right' answer, it will depend on the specific context.

  2. The initial proposed approach (Using Kinesis Data Stream as the primary delivery mechanism, and use that to feed two Kinesis Firehose streams - each one targeted at one of the destinations required) is an acceptable approach and pattern. However the question that should be answered is: does the customer want to create a 'raw data' bucket of these logs, or is the landed data (in either S3/parquet or Elasticsearch) the acceptable source of truth.

  3. The other patterns mentioned here by others are also acceptable patterns, however each should be reviewed for trade-offs and impacts to ensure that the solution matches the customer requirements and context (i.e. velocity, volume, data sources, data consumption patterns and tools, and more).

AWS
已回答 4 年前
profile picture
專家
已審閱 17 小時前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南