Lambda network bottleneck revisited: parallel GET requests from S3 Standard vs. S3 Express One Zone

The following AWS Lambda function tests reading 1,000 objects from S3 Standard vs. S3 Express One Zone. The objects are same in both cases and are rather small (46,080 bytes each). For each of the two buckets, the function tries pools of threads of varying sizes:

import concurrent.futures
import boto3
import botocore
import time
import json

def ms_now():
    return int(time.time_ns() / 1000000)

class Timer():
    def __init__(self, timestamp_function=ms_now):
        self.timestamp_function = timestamp_function
        self.start = self.timestamp_function()

    def stop(self):
        return self.timestamp_function() - self.start

def get_object(s3_client, bucket_name, key_name):
    response = s3_client.get_object(Bucket=bucket_name, Key=key_name)
    body = response['Body'].read()
    return body

def worker(bucket_name, i):
    return get_object(client, bucket_name, f"test/{i}/vectors")

session = boto3.session.Session()
config=botocore.client.Config(max_pool_connections=20)
client = session.client('s3', region_name="us-east-1", config=config)

def lambda_handler(event, context):
    for bucket_name in ['test-bucket--use1-az4--x-s3', 'test-bucket']:
        print(f"\n----------------\n{bucket_name}\n----------------\n")
        for n_workers in [1, 2, 4, 8, 16, 17, 18, 19, 20]:
            timer = Timer()
            try:
                with concurrent.futures.ThreadPoolExecutor(max_workers=n_workers) as executor:
                    futures = []
                    for i in range(1000):
                        futures.append(executor.submit(worker, bucket_name, i))
                    for future in concurrent.futures.as_completed(futures):
                        worker_result = future.result()
            except Exception as e:
                return {'error': str(e)}
            print(f"n_workers={n_workers}   time: {timer.stop()}ms", flush=True)
    return {
        'statusCode': 200,
        'body': json.dumps('Success!')
    }

The function runs in the same region in which both buckets live. It is not connected to VPC. I gave the function 10,240 MB of RAM to get the best network performance possible. Note however that the objects are less than 50 MB in total. Here is the output:

----------------
test-bucket--use1-az4--x-s3
----------------
n_workers=1   time: 7831ms
n_workers=2   time: 4037ms
n_workers=4   time: 2625ms
n_workers=8   time: 2826ms
n_workers=16   time: 2668ms
n_workers=17   time: 2485ms
n_workers=18   time: 2462ms
n_workers=19   time: 2511ms
n_workers=20   time: 2410ms
----------------
test-bucket
----------------
n_workers=1   time: 47625ms
n_workers=2   time: 17334ms
n_workers=4   time: 6993ms
n_workers=8   time: 3508ms
n_workers=16   time: 2868ms
n_workers=17   time: 2690ms
n_workers=18   time: 2917ms
n_workers=19   time: 2660ms
n_workers=20   time: 2587ms

With one worker, we see that S3 Express One Zone achieves almost 7 times better performance. However, as the number of threads increases, both buckets converge to the same performance of about 2.5 seconds to read 1,000 objects.

It looks like the bottleneck has to do with the network layer of the AWS Lambda instance. What is this bottleneck all about? Is it possible to overcome it to take advantage of the significantly lower latency of S3 Express One Zone?

P.S. As a small side point, it is interesting how doubling the number of threads from 1 to 2 and from 2 to 4 improves the time for the Standard S3 bucket almost by a factor of three...

Topics

Storage Serverless Compute

Relevant content

Same zone for S3 Express One Zone and Lambda
AlwaysLearning
asked 5 months ago
S3 Express One Zone: no such bucket
AlwaysLearning
asked 5 months ago
BUG: no access to S3 Express One Zone (aka directory buckets) from AWS Lambda
Accepted Answer
AlwaysLearning
asked 5 months ago
Latest AWS CLI does not see S3 Express One Zone buckets
Accepted Answer
AlwaysLearning
asked 5 months ago
How do I view objects that failed replication from one Amazon S3 bucket to another?
AWS OFFICIALUpdated a year ago
How do I move data or files from EFS Standard-Infrequent Access or EFS ONE Zone-Infrequent Access to EFS Standard or EFS One Zone storage class?
AWS OFFICIALUpdated 2 years ago
How can I use a Lambda function to copy files from one Amazon S3 bucket to another?
AWS OFFICIALUpdated 6 months ago
How can use the AWS CLI to restore an Amazon S3 object from the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage class?
AWS OFFICIALUpdated a year ago
Optimizing Storage Costs by Transitioning Millions of S3 Objects from Standard to Glacier Tier
EXPERT
Ben Lee
published 5 months ago
Automated S3 Content Library Indexing with AWS Lambda
EXPERT
Patrick Kremer
published a year ago