- Newest
- Most votes
- Most comments
Firstly, I suggest to increase the default 128MB RAM memory allocated to the lambda function to some higher value as it increases the lambda's CPU proportionally and should improve your download speed. see from the documentation:
However, 128 MB should usually only be used for the simplest of Lambda functions, such as those that transform and route events to other AWS services. If the function imports libraries or Lambda layers, or interacts with data loaded from S3 or EFS, it’s likely to be more performant with a higher memory allocation.
The amount of memory also determines the amount of virtual CPU available to a function. Adding more memory proportionally increases the amount of CPU, increasing the overall computational power available. If a function is CPU-, network- or memory-bound, then changing the memory setting can dramatically improve its performance.
In addition, Instead of downloading the files sequentially you can try downloading them in parallel by using concurrent.futures
to open multiple threads.
Here is a sample code:
import boto3
import concurrent.futures
import os
# Initialize the S3 resource
s3_resource = boto3.resource('s3')
# Function to download a single file
def download_file(bucket_name, s3_key, local_path):
bucket = s3_resource.Bucket(bucket_name)
bucket.download_file(s3_key, local_path)
print(f"Downloaded {s3_key} to {local_path}")
# Function to download multiple files in parallel
def download_files_in_parallel(bucket_name, s3_keys, local_dir):
# Ensure the local directory exists
if not os.path.exists(local_dir):
os.makedirs(local_dir)
# List to hold futures
futures = []
with concurrent.futures.ThreadPoolExecutor() as executor:
for s3_key in s3_keys:
local_path = os.path.join(local_dir, os.path.basename(s3_key))
futures.append(executor.submit(download_file, bucket_name, s3_key, local_path))
# Wait for all futures to complete
concurrent.futures.wait(futures)
# Function to list all files in the S3 bucket
def list_files_in_bucket(bucket_name):
s3_client = boto3.client('s3')
s3_keys = []
continuation_token = None
while True:
if continuation_token:
response = s3_client.list_objects_v2(Bucket=bucket_name, ContinuationToken=continuation_token)
else:
response = s3_client.list_objects_v2(Bucket=bucket_name)
if 'Contents' in response:
for obj in response['Contents']:
s3_keys.append(obj['Key'])
if response.get('IsTruncated'): # More results are available
continuation_token = response['NextContinuationToken']
else:
break
return s3_keys
def lambda_handler(event, context):
# Example usage
bucket_name = 'your-bucket-name'
local_dir = '/tmp/downloaded_files' # Lambda's writable directory
# Get the list of files in the bucket
s3_keys = list_files_in_bucket(bucket_name)
if s3_keys:
# Download files in parallel
download_files_in_parallel(bucket_name, s3_keys, local_dir)
return {
'statusCode': 200,
'body': 'Files downloaded successfully'
}
else:
return {
'statusCode': 404,
'body': 'No files found in the bucket'
}
Relevant content
- asked a year ago
- asked 2 months ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 23 days ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 6 months ago