Calling Service(RPC) API from EMR or distributed map at scale

0

We want to call an service(ECS based) API at scale around 250K tps using AWS , we are looking for suggestions and want to evaluate can we do the same in EMR or distributed map. Please let me know if there is an suggestions or example you can point us to.

We are expecting our service latency to be 10ms.

asked 7 months ago188 views
1 Answer
0

Without knowing the specifics, I can only give general guidance. To ensure <10ms latency consistently, you will need to have the requester and service in the same AZ and the service to be instantiated and initialized and ready to process the request. The distance between AZs is single digit ms latency, but this doesn't include the time to negotiate a connection, process the request and prepare the response, so some AZs will not have a total latency of less than 10ms. This will require you to have subnets that are large enough to handle the requests and responses in order to have the IP addresses for this full scale. You will also want to consider using WebSockets or other technologies for multiple request/response pairs without creating new connections, which will impede your ability to scale.

Now with some assumptions:

  1. Your traffic increases to the 250k TPS over some time. If this is true, you can use auto-scaling to grow the service as traffic increases. You will have to benchmark the number of requests, the ramp time, the start-up and initialization time, etc. to ensure you can increase resources at the pace to meet the requests. Containers start faster than instances, so ECS is a good choice. EMR can be auto scaled, but I believe the implementation of load balancing would be a challenge as connecting to the individual nodes is not a common use case, rather connected from the instances is a common use case.
  2. Your traffic has multiple sources that are not directly under your control. You may want to consider a bulkhead architecture, where the sources are mapped to their own service bulkheads (a.k.a. cells) such that one customer's traffic doesn't impact another's traffic. It also has the benefit of containing impact if the cell has impairment in that not all your customers will be impacted.

I'd suggest you contact your AWS Account team and talk with your solutions architect to dig deeper on the details and help you implement your solution.

profile pictureAWS
answered 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions