A question about inter-region latency.


According to this blog post from concurrencylabs the latency per region can differ widely when you compare them to each other. The author of the blog post had made a test to measure both the inter and intra region latency between regions and posted the results in a few matrix tables.

Heres the matrix table that shows the inter- and intra-region latency results. https://www.concurrencylabs.com/img/posts/9-choose-region-wisely/ec2-s3-1mb-table.png

All of the intra-region latencies (marked in blue) vary wildly with Oregon and Tokyo being big outliers. Oregon is by far the fastest and takes only 81 ms to transfer 1 MB from EC2 to S3 within the same region. Tokyo is the slowest of them all and takes 755 milliseconds.

The ping latency results are interesting as well https://www.concurrencylabs.com/blog/choose-your-aws-region-wisely/. They all have an inter-region ping of around 3 ms except for Oregon. The ping latency for oregon to oregon is a whopping 211 ms.

My question is: Why do all these regions differ so much in latency? Shouldn't the infrastructure at each region be of similar quality?

asked a year ago469 views
1 Answer
Accepted Answer

I find the latency test for Oregon to be quite out-of-the ordinary; I just ran a test with some instances across different AZs in Oregon and got times that I expected (single digit millisecond or less). I suspect there was some network anomaly there while the test for the blog was running - but as always, I encourage customers to do their own tests because there are so many variables - operating systems, software versions, test stacks, application stacks, etc.

As for the 1 MB transfer table: If you look at the numbers and compare them to the 10 MB transfer table you'll see a lot of transfer times that are very close to each other despite one transfer being ten times the size of the other. For small transfers (and 1 MB is a small transfer) there will be a fair chunk of that taken in negotiating the TLS session and then some time will be taken by S3 to deal with the file that is being uploaded.

Again, there are going to be variances in this particular benchmark - far greater than with the latency test because S3 is a multi-tenant service. When the test is being run you simply don't know what else is going on; other operations that other customers are running; or what is happening within the service itself. Naturally, this is why the blog post writer has run many tests to get a good set of data - but it's a good idea to run those tests across several days; weeks or even months to ensure that the data set is representative of the conditions being tested.

The great thing about creating a test suite and a set of data like that (especially if you're testing so that you have a benchmark to compare your production systems against) is that you can see when performance improves (or degrades) and (perhaps) do something about it; or at least take that data and explain why your system might be behaving in a different way.

To answer your final question: Regions have all been built at different times and there will be differences in how they were constructed. Regions are always undergoing change - mostly expansion; but also new services being deployed; and existing services (even ones you can't see like the network) being upgraded. Each region is built on a different geography - there's no way to get the same length fibre runs between AZs in a different city and the speed of light in glass (or the speed of electrons in copper) does make a difference - it isn't infinite.

TL;DR: Run your own tests if these types of numbers matter; there are many, many variables.

profile picture
answered a year ago
profile picture
reviewed a year ago
  • When i look at the inter-region ping test result for North virginia against Frankfurt I see that the ping is 169 ms. I'm currently testing an application thats running on localhost but is communicating with a database in the region North virginia. For a simple query like SELECT 1 FROM table i get an average ping of about 552 milliseconds. Thats a lot more then that reported 169 milliseconds. I live relatively near Frankfurt, (Groningen to be exact). Is the difference in latency caused because i'm using the public internet instead of the private network of AWS to do the query? Thank you

  • A ping test is (generally) two packets: An echo-request; and echo-reply. That's it.

    A database query is part of a TCP session (which may or may not be already established; if not already established it's three packets to start then an unknown number of packets to authenticate to the database; then:) and depending on the database you might only have two packets (one for the query, on for the acknowledgment) and then the query response (an undetermined number of packets depending on the size of the response).

    It's unsurprising that a database query is longer than a ping test because you're not testing the round-trip time of the network; you're testing that plus the database response time and that's dependent on the load on the database as well (and perhaps CPU, memory, disk load on the server if it isn't single-purpose).

    Assuming a 169ms round-trip time, 552/169 = (approx) 3 round trips which seems somewhat correct (two round trips - one for query; one for response) plus some database response time. But I'm guessing. The only way to know is to run the same test across multiple different scenarios.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions