Ingesting data from external sources like Git, Slack, Zoom, Instagram, 3rd party systems

0

Problem

I want to know, understand and correct my knowledge, approach on, Setting up an Data Ingestion pipeline, which collects "events" or "data" from any possible external application sources (applications of 3rd party)

The rate of ingestion can be about 5000 (5K) events per day (normal) on peak it can go slightly more 20K

Approach I been thinking about

I am planning to setup AWS Lambda endpoint to which external systems can post(HTTP POST) the data, which then can load into OpenSearch to form a Data Lake

The ingestion pipeline operates between Lambda and OpenSearch, to perform

  • Parsing of data
  • Fetch more data if needed, by making API calls
  • Process, transform, enrich
  • Post to OpenSearch Indices as per indices

I have been googling and exploring on AWS but so far can't find any thing which can validate above. Hence request you experts to comment, suggest and direct me to a practical solution

BAS
posta 2 anni fa427 visualizzazioni
1 Risposta
0

Hi, Based on the ingestion rates, I think that the architecture you envisioned should work well.

It is a variant of this pattern , the difference is that you use the Lambda to load Opensearch and not to trigger a search.

The other pattern you could consider is to use API Gateway as a proxy to Amazon Kinesis Firehose (please, note the tutorial is for Kinesis Streams, and was only meant as a source of inspiration), and then use Firehose to transform the data (it will still uses Lambda functions for the transformations) and then deliver the data directly to OpenSearch.

Some information additional information on this can be

AWS
ESPERTO
con risposta 2 anni fa
  • Thanks for your response

    I am not able to see a ingestion pipeline other than Kinesis based I feel Kinesis is a overkill for me, as I will not be having continuous real time data flowing, it can be busts of events.

    Hence trying to see, if any lighter stuff for processing (can event MQ based lambda or a pipeline) in AWS, else I guess myself have to setup a always On pipeline service

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande