Multiple Kinesis Firehose Destinations

1

A customer wants to use Kinesis for gathering and aggregating log data from multiple accounts into a central account. They have two destinations - 1. S3 (using parquet transformation for easy Glue/Athena access), 2. ElasticSearch (for visualization and common queries). The question is what is the AWS recommended approach for this and why?

These are the three approaches that come to mind, however looking for more guidance:

A. Using Kinesis Data Stream as the primary delivery mechanism, and use that to feed two Kinesis Firehose streams - each one targeted at one of the destinations (see above).

B. Using Kinesis Firehose as the primary delivery mechanism, targeting initial delivery into S3 (with parquet conversion), and using an S3/Lambda trigger to load the data into ElasticSearch.

C. Using Kinesis Firehose as the primary delivery mechanism, targeting both S3 (raw - not converted) delivery and ElasticSearch delivery. Then using an S3/Lambda trigger to transform the raw S3 data into parquet format saved back into S3.

AWS
posta 4 anni fa3861 visualizzazioni
1 Risposta
1
Risposta accettata

I have spoken with several others about this same question and the answer really boiled down to this:

  1. There are many ways to move data around AWS, and many of them can be the 'right' way depending on several factors such as velocity, volume, data sources, data consumption patterns and tools, and more. In short - there is no blanket 'right' answer, it will depend on the specific context.

  2. The initial proposed approach (Using Kinesis Data Stream as the primary delivery mechanism, and use that to feed two Kinesis Firehose streams - each one targeted at one of the destinations required) is an acceptable approach and pattern. However the question that should be answered is: does the customer want to create a 'raw data' bucket of these logs, or is the landed data (in either S3/parquet or Elasticsearch) the acceptable source of truth.

  3. The other patterns mentioned here by others are also acceptable patterns, however each should be reviewed for trade-offs and impacts to ensure that the solution matches the customer requirements and context (i.e. velocity, volume, data sources, data consumption patterns and tools, and more).

AWS
con risposta 4 anni fa
profile picture
ESPERTO
verificato un mese fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande