How to build a data pipeline with one api gateway?


We have one API gateway that receive data 24x7 of gzipped meter data and the data come in concurrently(some times 5000 posts per second, sometimes not much), we are sure the compressed data won't excceed 10MB apigw limitation.

We have two goals:

  1. deliver the decompressed data into s3, but before that we need to do some rename and verification, e.g. we will only accept payload with correct decoded signature. So we have to use one lambda funciton. This one just store the data.
  2. Another goal is to ingest data to a lambda function and do some data processing and write to Timestream database.

Currently, we are using two lambdas, one store data, anotehr process and write to TimestreamDB.

Please provide us some way to more efficently do the job

1 réponse

It appears you have 3 main tasks in your flow of processing these incoming zip files:

  1. Accept incoming zip files at a high rate
  2. Decompress and validate each zip file
  3. Process validated files and push data to Timestream database

For greater efficiency I'd recommend the following:

  1. Use API Gateway direct integration to simply store each incoming zip file directly to an S3 bucket. This link contains patterns to explore, the first one (direct proxy) is an excellent choice:

  2. Setup the incoming S3 bucket trigger for new files to call a Lambda function. Create a second S3 bucket for Lambda to store validated and decompressed files. The Lambda could also delete the zip file after processing. Here is a link describing a similar flow:

  3. The second S3 bucket can be configured with a trigger to call a second Lambda function to process the decompressed files and store into the Timestream database.

There are other options you could explore using S3 triggers to send events to SNS topics or SQS message queues as well:

profile picture
répondu il y a un an
  • file create and delete put will cost a lot, how about processing and save to s3, then read and do other stuff

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions