To convert all files in an AWS S3 bucket to zip files


asked 9 months ago7926 views
2 Answers
Accepted Answer

S3 is an object storage, it's not a file system. So you'll have to download the files first from S3, zip them and then upload again back to s3, if you don't do in-memory operation. Once you verify that zip upload is successful for all the objects, you can consider archiving/deleting the objects based on data criticality.

You can either do it through an EC2 instance or lambda function or may be your local machine(through CLI). With lambda there is a limitation of 900 seconds max time. Consider local machine for downloading/uploading as last option as that may add data transfer cost.

Hope this explanation helps and provide you a direction for how to move forward.

answered 9 months ago
reviewed 9 months ago

To convert all files in your s3 bucket into one single zip file you can use use AWS Lambda (Python) with the AWS SDK for Python (Boto3).

  • The below is code to convert all content of bucket into one single zip file
import boto3
import zipfile
import io

s3 = boto3.client('s3')

def lambda_handler(event, context):
    source_bucket = 'your-source-bucket'
    target_bucket = 'your-target-bucket'
    # List objects in the source bucket
    response = s3.list_objects_v2(Bucket=source_bucket)
    if 'Contents' in response:
        objects = response.get('Contents', [])
        for obj in objects:
            key = obj.get('Key')
            if key:
                # Get object content
                response = s3.get_object(Bucket=source_bucket, Key=key)
                content = response['Body'].read()
                # Create a zip file in-memory
                zip_buffer = io.BytesIO()
                with zipfile.ZipFile(zip_buffer, 'w', zipfile.ZIP_DEFLATED) as zipf:
                    zipf.writestr(key, content)
                # Upload the zip file to the target bucket
                target_key = key + '.zip'
                s3.upload_fileobj(zip_buffer, target_bucket, target_key)
        print("No objects found in the source bucket.")

  • To covert each object into a zip file you can use this code. (If you have 10 objects, you will have 10 zip files)
import boto3
import zipfile
import io

s3 = boto3.client('s3')

def lambda_handler(event, context):
    source_bucket = 'your-source-bucket'
    target_bucket = 'your-target-bucket'
    zip_file_name = ''  # Name of the zip file
    # List objects in the source bucket
    response = s3.list_objects_v2(Bucket=source_bucket)
    if 'Contents' in response:
        objects = response.get('Contents', [])
        # Create a zip file in-memory
        zip_buffer = io.BytesIO()
        with zipfile.ZipFile(zip_buffer, 'w', zipfile.ZIP_DEFLATED) as zipf:
            for obj in objects:
                key = obj.get('Key')
                if key:
                    # Get object content
                    response = s3.get_object(Bucket=source_bucket, Key=key)
                    content = response['Body'].read()
                    # Write the file to the zip archive
                    zipf.writestr(key, content)
        # Upload the zip file to the target bucket
        s3.upload_fileobj(zip_buffer, target_bucket, zip_file_name)
        print("No objects found in the source bucket.")

answered 9 months ago
reviewed 2 months ago
    1. did you get the descriptor of the script backward on each one? the top one looks like individual files, the bottom looks like a single zip, based on the variable requesting the name of the zip in the bottom one and based on using target_key = key + '.zip' in the top.

    2. if my bucket contained a collection of folders, would each folder be zipped into a separate zip file in the target bucket? or would it recursively go through each folder and make a thousand little zips?


  • I used your script to zip up select folders. The script only did a single folder path inside of the target folder. Such that, if folderA contains six subfolders, only the first subfolder will be copied to the zip. It does give me a starting point, though. going to try something with shutil.make_archive

