By using AWS re:Post, you agree to the Terms of Use

Ingesting data from external sources like Git, Slack, Zoom, Instagram, 3rd party systems

0

Problem

I want to know, understand and correct my knowledge, approach on, Setting up an Data Ingestion pipeline, which collects "events" or "data" from any possible external application sources (applications of 3rd party)

The rate of ingestion can be about 5000 (5K) events per day (normal) on peak it can go slightly more 20K

Approach I been thinking about

I am planning to setup AWS Lambda endpoint to which external systems can post(HTTP POST) the data, which then can load into OpenSearch to form a Data Lake

The ingestion pipeline operates between Lambda and OpenSearch, to perform

  • Parsing of data
  • Fetch more data if needed, by making API calls
  • Process, transform, enrich
  • Post to OpenSearch Indices as per indices

I have been googling and exploring on AWS but so far can't find any thing which can validate above. Hence request you experts to comment, suggest and direct me to a practical solution

1 Answer
0

Hi, Based on the ingestion rates, I think that the architecture you envisioned should work well.

It is a variant of this pattern , the difference is that you use the Lambda to load Opensearch and not to trigger a search.

The other pattern you could consider is to use API Gateway as a proxy to Amazon Kinesis Firehose (please, note the tutorial is for Kinesis Streams, and was only meant as a source of inspiration), and then use Firehose to transform the data (it will still uses Lambda functions for the transformations) and then deliver the data directly to OpenSearch.

Some information additional information on this can be

EXPERT
answered 6 months ago
  • Thanks for your response

    I am not able to see a ingestion pipeline other than Kinesis based I feel Kinesis is a overkill for me, as I will not be having continuous real time data flowing, it can be busts of events.

    Hence trying to see, if any lighter stuff for processing (can event MQ based lambda or a pipeline) in AWS, else I guess myself have to setup a always On pipeline service

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions