Lambda network bottleneck revisited: parallel GET requests from S3 Standard vs. S3 Express One Zone

0

The following AWS Lambda function tests reading 1,000 objects from S3 Standard vs. S3 Express One Zone. The objects are same in both cases and are rather small (46,080 bytes each). For each of the two buckets, the function tries pools of threads of varying sizes:

import concurrent.futures
import boto3
import botocore
import time
import json

def ms_now():
    return int(time.time_ns() / 1000000)

class Timer():
    def __init__(self, timestamp_function=ms_now):
        self.timestamp_function = timestamp_function
        self.start = self.timestamp_function()

    def stop(self):
        return self.timestamp_function() - self.start

def get_object(s3_client, bucket_name, key_name):
    response = s3_client.get_object(Bucket=bucket_name, Key=key_name)
    body = response['Body'].read()
    return body

def worker(bucket_name, i):
    return get_object(client, bucket_name, f"test/{i}/vectors")

session = boto3.session.Session()
config=botocore.client.Config(max_pool_connections=20)
client = session.client('s3', region_name="us-east-1", config=config)

def lambda_handler(event, context):
    for bucket_name in ['test-bucket--use1-az4--x-s3', 'test-bucket']:
        print(f"\n----------------\n{bucket_name}\n----------------\n")
        for n_workers in [1, 2, 4, 8, 16, 17, 18, 19, 20]:
            timer = Timer()
            try:
                with concurrent.futures.ThreadPoolExecutor(max_workers=n_workers) as executor:
                    futures = []
                    for i in range(1000):
                        futures.append(executor.submit(worker, bucket_name, i))
                    for future in concurrent.futures.as_completed(futures):
                        worker_result = future.result()
            except Exception as e:
                return {'error': str(e)}
            print(f"n_workers={n_workers}   time: {timer.stop()}ms", flush=True)
    return {
        'statusCode': 200,
        'body': json.dumps('Success!')
    }

The function runs in the same region in which both buckets live. It is not connected to VPC. I gave the function 10,240 MB of RAM to get the best network performance possible. Note however that the objects are less than 50 MB in total. Here is the output:

----------------
test-bucket--use1-az4--x-s3
----------------
n_workers=1   time: 7831ms
n_workers=2   time: 4037ms
n_workers=4   time: 2625ms
n_workers=8   time: 2826ms
n_workers=16   time: 2668ms
n_workers=17   time: 2485ms
n_workers=18   time: 2462ms
n_workers=19   time: 2511ms
n_workers=20   time: 2410ms
----------------
test-bucket
----------------
n_workers=1   time: 47625ms
n_workers=2   time: 17334ms
n_workers=4   time: 6993ms
n_workers=8   time: 3508ms
n_workers=16   time: 2868ms
n_workers=17   time: 2690ms
n_workers=18   time: 2917ms
n_workers=19   time: 2660ms
n_workers=20   time: 2587ms

With one worker, we see that S3 Express One Zone achieves almost 7 times better performance. However, as the number of threads increases, both buckets converge to the same performance of about 2.5 seconds to read 1,000 objects.

It looks like the bottleneck has to do with the network layer of the AWS Lambda instance. What is this bottleneck all about? Is it possible to overcome it to take advantage of the significantly lower latency of S3 Express One Zone?

P.S. As a small side point, it is interesting how doubling the number of threads from 1 to 2 and from 2 to 4 improves the time for the Standard S3 bucket almost by a factor of three...

asked 5 months ago247 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions