Multiple Kinesis Firehose Destinations

1

A customer wants to use Kinesis for gathering and aggregating log data from multiple accounts into a central account. They have two destinations - 1. S3 (using parquet transformation for easy Glue/Athena access), 2. ElasticSearch (for visualization and common queries). The question is what is the AWS recommended approach for this and why?

These are the three approaches that come to mind, however looking for more guidance:

A. Using Kinesis Data Stream as the primary delivery mechanism, and use that to feed two Kinesis Firehose streams - each one targeted at one of the destinations (see above).

B. Using Kinesis Firehose as the primary delivery mechanism, targeting initial delivery into S3 (with parquet conversion), and using an S3/Lambda trigger to load the data into ElasticSearch.

C. Using Kinesis Firehose as the primary delivery mechanism, targeting both S3 (raw - not converted) delivery and ElasticSearch delivery. Then using an S3/Lambda trigger to transform the raw S3 data into parquet format saved back into S3.

AWS
질문됨 4년 전3845회 조회
1개 답변
1
수락된 답변

I have spoken with several others about this same question and the answer really boiled down to this:

  1. There are many ways to move data around AWS, and many of them can be the 'right' way depending on several factors such as velocity, volume, data sources, data consumption patterns and tools, and more. In short - there is no blanket 'right' answer, it will depend on the specific context.

  2. The initial proposed approach (Using Kinesis Data Stream as the primary delivery mechanism, and use that to feed two Kinesis Firehose streams - each one targeted at one of the destinations required) is an acceptable approach and pattern. However the question that should be answered is: does the customer want to create a 'raw data' bucket of these logs, or is the landed data (in either S3/parquet or Elasticsearch) the acceptable source of truth.

  3. The other patterns mentioned here by others are also acceptable patterns, however each should be reviewed for trade-offs and impacts to ensure that the solution matches the customer requirements and context (i.e. velocity, volume, data sources, data consumption patterns and tools, and more).

AWS
답변함 4년 전
profile picture
전문가
검토됨 한 달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠