Lambda to DynamoDB throughput question

Question

IHAC who sent me the following email:

> I'm working to use Lambda as our primary computation environment. So
> far, that amounts to funneling data ingested via the API Gateway to
> various endpoints (often similar in effect to the AWS IoT rules
> engine) and using DynamoDB to store configuration data.
> 
> The obstacle I'm currently grappling with is the throughput limits on
> DynamoDB. In standard operation, we have a slow, steady stream of
> requests that don't begin to approach our limits. However, on rare
> occasions, I'll need to add a large data store. As things are set up,
> that translates to a large number of near simultaneous requests into
> DynamoDB. However, we don't have a latency requirement. Within reason,
> I don't care when this operation completes, just that it does. If I
> could space these requests to stay below our limits, the problem would
> be solved.
> 
> In essence, I want our burst response to distribute the load over time
> as opposed to scaling up our systems.
> 
> Initially, I was trying to setup a scheduler, a function I could call
> to simply say "Try this lambda function again in X.Y minutes" with
> CloudWatch Events. However, I ran into a different limitation there of
> only being able to make 5 CloudWatch API requests per second. I didn't
> solve the throughput issue so much as move it to a different service. 
> 
> I have a couple different ways of solving this specific problem, but
> the overall scheduling design pattern was one I'm really interested
> in.

My initial thought is to introduce SQS between the API Gateway-fronted Lambda. That Lambda would write the payload to SQS, then use CloudWatch metrics to kick off an additional Lambda to process messages from the queue when the queue depth is greater than zero. If there is an issue writing to DynamoDB, the message simply not be removed from the queue and it can be processed later. 

Does that make sense, or is there a better suggestion for the customer?

Accepted Answer

I would suggest that you first send the data to **SQS** and then from SQS you can *"pool"* the new ingested messages and send to DynamoDB
With this system you can queue spikes of messages in SQS and then later upload to DynamoDB with a more *"steady"* throughput.

Lambda to DynamoDB throughput question

관련 콘텐츠