Accessing S3 Objects takes time

0

Hi, I have recently started using AWS, so I am little knowledge about all the AWS services. I have a software that is accessing Objects in S3 (images ~4MB). I am facing issues while downloading/accessing objects from my s3 bucket, it take 5 mins to download around 64MB data. I am using boto3, I am using lambda function to iterate though all the images and create a zip file and then i am downloading it. Can someone help me understand what is wrong, or can you help me refine my approach?

Sayali
asked a month ago273 views
1 Answer
0

Firstly, I suggest to increase the default 128MB RAM memory allocated to the lambda function to some higher value as it increases the lambda's CPU proportionally and should improve your download speed. see from the documentation:

However, 128 MB should usually only be used for the simplest of Lambda functions, such as those that transform and route events to other AWS services. If the function imports libraries or Lambda layers, or interacts with data loaded from S3 or EFS, it’s likely to be more performant with a higher memory allocation.

The amount of memory also determines the amount of virtual CPU available to a function. Adding more memory proportionally increases the amount of CPU, increasing the overall computational power available. If a function is CPU-, network- or memory-bound, then changing the memory setting can dramatically improve its performance.

In addition, Instead of downloading the files sequentially you can try downloading them in parallel by using concurrent.futures to open multiple threads.

Here is a sample code:

import boto3
import concurrent.futures
import os

# Initialize the S3 resource
s3_resource = boto3.resource('s3')

# Function to download a single file
def download_file(bucket_name, s3_key, local_path):
    bucket = s3_resource.Bucket(bucket_name)
    bucket.download_file(s3_key, local_path)
    print(f"Downloaded {s3_key} to {local_path}")

# Function to download multiple files in parallel
def download_files_in_parallel(bucket_name, s3_keys, local_dir):
    # Ensure the local directory exists
    if not os.path.exists(local_dir):
        os.makedirs(local_dir)

    # List to hold futures
    futures = []

    with concurrent.futures.ThreadPoolExecutor() as executor:
        for s3_key in s3_keys:
            local_path = os.path.join(local_dir, os.path.basename(s3_key))
            futures.append(executor.submit(download_file, bucket_name, s3_key, local_path))

        # Wait for all futures to complete
        concurrent.futures.wait(futures)

# Function to list all files in the S3 bucket
def list_files_in_bucket(bucket_name):
    s3_client = boto3.client('s3')
    s3_keys = []
    continuation_token = None

    while True:
        if continuation_token:
            response = s3_client.list_objects_v2(Bucket=bucket_name, ContinuationToken=continuation_token)
        else:
            response = s3_client.list_objects_v2(Bucket=bucket_name)

        if 'Contents' in response:
            for obj in response['Contents']:
                s3_keys.append(obj['Key'])

        if response.get('IsTruncated'):  # More results are available
            continuation_token = response['NextContinuationToken']
        else:
            break

    return s3_keys

def lambda_handler(event, context):
    # Example usage
    bucket_name = 'your-bucket-name'
    local_dir = '/tmp/downloaded_files'  # Lambda's writable directory

    # Get the list of files in the bucket
    s3_keys = list_files_in_bucket(bucket_name)

    if s3_keys:
        # Download files in parallel
        download_files_in_parallel(bucket_name, s3_keys, local_dir)
        return {
            'statusCode': 200,
            'body': 'Files downloaded successfully'
        }
    else:
        return {
            'statusCode': 404,
            'body': 'No files found in the bucket'
        }
profile pictureAWS
EXPERT
answered a month ago
profile picture
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions