Best AWS (hopefully serverless) service to batch data?

0

Let's take post likes as high velocity data example.

Post receives 1000 likes per second. Posts are stored in DynamoDB with 2 GSIs to support different access patterns. This means that 3000 DynamoDb writes are taking place ~each second per post. 1000/partition is already a DDB write limit.

But I'd like the system to support even bigger likes/second.


I don't care if likes count is few seconds behind the "real" count, so batching the likes would be the best option here!

So the flow should look like this:

  • send all the likes (for every post) to some service that has HUUUGE writes/second limit
  • this service batches** likes per post** per time window (e.g 900 writes per second, batch window is 3 seconds, 900 * 3 = 2700)
  • batched likes count is outputted after batch window is done (e.g every 3 seconds)
EC2 -> (1000s of likes per second) -> batch service -> (batched likes per post) -> EC2 -> DDB

So it would basically be like a buffer in front of DynamoDb to slow the writes down.

Any ideas? Is there any (serverless) AWS service that I could use for this? Or are Redis counters the best way..

asked 2 years ago193 views
1 Answer
2

There is indeed a 1000 WCUs/partition limit, but that doesn't mean a table has a 1000 WCUs limit. If your app is well-designed, that is your reads/writes are evenly distributed among partitions, you won't necessarily be constrained by partitions limit. You can't decide how many partitions your table will have, as it's managed internally by dynamodb. It will add more partitions to scale horizontally. But roughly speaking a dynamo partition will store up to 10GB, use 3000 RCUs or 1000 WCUs.

That being said, the more RCUs/WCUs the higher the price. So increasing the provisioned capacity isn't necessarily a valid solution.

A very common pattern to control writes rate is putting an SQS queue in front of dynamodb. So you app publishes write operations to this queue that acts as a buffer of operations. Then you can process messages in that queue using a lambda function. You can control how fast you want to write to dynamodb using lambda's reserve concurrency and SQS batch size.

On the other hand, to compute any data aggregation like likes count, I would recommend you use dynamodb streams and lambda to compute them in a defer way.

As you mention, another alternative would be to use ElastiCache (with Redis flavor) as datastore instead of dynamodb. I don't know your exact requirements but, ElastiCache (or a redis instance in EC2) isn't a serverless solution. Since Redis is a database in memory, you can achieve a higher throughput than with dynamodb, however you would need to figure out how to persist data, so you don't lose it when redis is intentionally/accidentally shut down.

cjuega
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions