Multiple Kinesis Firehose Destinations

1

A customer wants to use Kinesis for gathering and aggregating log data from multiple accounts into a central account. They have two destinations - 1. S3 (using parquet transformation for easy Glue/Athena access), 2. ElasticSearch (for visualization and common queries). The question is what is the AWS recommended approach for this and why?

These are the three approaches that come to mind, however looking for more guidance:

A. Using Kinesis Data Stream as the primary delivery mechanism, and use that to feed two Kinesis Firehose streams - each one targeted at one of the destinations (see above).

B. Using Kinesis Firehose as the primary delivery mechanism, targeting initial delivery into S3 (with parquet conversion), and using an S3/Lambda trigger to load the data into ElasticSearch.

C. Using Kinesis Firehose as the primary delivery mechanism, targeting both S3 (raw - not converted) delivery and ElasticSearch delivery. Then using an S3/Lambda trigger to transform the raw S3 data into parquet format saved back into S3.

AWS
已提问 4 年前3804 查看次数
1 回答
1
已接受的回答

I have spoken with several others about this same question and the answer really boiled down to this:

  1. There are many ways to move data around AWS, and many of them can be the 'right' way depending on several factors such as velocity, volume, data sources, data consumption patterns and tools, and more. In short - there is no blanket 'right' answer, it will depend on the specific context.

  2. The initial proposed approach (Using Kinesis Data Stream as the primary delivery mechanism, and use that to feed two Kinesis Firehose streams - each one targeted at one of the destinations required) is an acceptable approach and pattern. However the question that should be answered is: does the customer want to create a 'raw data' bucket of these logs, or is the landed data (in either S3/parquet or Elasticsearch) the acceptable source of truth.

  3. The other patterns mentioned here by others are also acceptable patterns, however each should be reviewed for trade-offs and impacts to ensure that the solution matches the customer requirements and context (i.e. velocity, volume, data sources, data consumption patterns and tools, and more).

AWS
已回答 4 年前
profile picture
专家
已审核 23 天前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则