synchronous queue implementation on AWS

0

I have a queue in which producers are adding data and consumers wants to read and process it.

In the diagram below producers are adding data in a queue with (Px, Tx, X) example (P3, T3,10) here, P3 is the producer ID, T3 is the number of packets required to process and 10 is data.

for (P3, T3,10) consumer needs to read 3 packets from the P3 producer so In the Image below, one of the consumer needs to pick (P3, T3,10), (P3, T3,15) and (P3, T3,5) and perform a function on data that just add all the number that is 10+15+5 = 30 and save 30 to DB.

Similarly there is a case for P1 producer (P1,T2,1) and (P1,T2,10) sum = 10+1 = 11 to DB.

I have read about AWS Kinesis but it has issues, all consumers read the same data which doesn't fit my case.

The major issue is how we can limit consumers for:

1 - Read data queue in synchronous.

2 - If one of the consumers has read (P1, T2,1) then only this consumer can read the next packet from the P1 producer (This point is the major issue for me as the consumer need to add those two number)

3 - This can also cause deadlock as some of the consumers will be forced to read data from a particular producer only because they have already read one packet from the same producer, now they have to wait for the next packet to perform add.

I have also read about SQS and MQ but the above challenges still exist for them too.

Image

https://i.stack.imgur.com/7b3Mm.png

My current approach:

for N produces I have started N EC2 instances, producers send data to EC2 through WebSocket (Websocket is not a requirement) and I can process it there easily. As you can see having N EC2 to process N producers will cause budget issues, how can I improve on this solution.

  • How many producers? How many consumers? How many messages?

  • 100+ producers (This may increase going forward), each producer can have a data rate of 20KB/sec The consumers are something I am trying to optimize, There is no limit on consumers.

1 回答
1

I assume that it is given that the producers need to send their payload in multiple messages, otherwise, combine them and send them as a single message. If the reason for not consolidating is payload size, you could save the payload in S3/DynamoDB/etc and only send a pointer to the data in the queue.

If you still need to send multiple messages, you can use Kafka (or the managed version MSK). You will create a single topic with multiple partitions. You will define the producer ID as the message key which will route all the message for the same producer to the same partition within the topic. You will create a single consumer group which will subscribe to that topic. Within the consumer group each partition will be handled by a single consumer. Note that the consumer may still handle messages for multiple partitions, i.e., producers, so it will need to maintain in memory the relevant data until it receives all messages.

profile pictureAWS
专家
Uri
已回答 2 年前
profile picture
专家
已审核 1 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则