To convert all files in an AWS S3 bucket to zip files

0

To convert all files in an AWS S3 bucket to zip files

KARTHIK
asked 8 months ago6919 views
2 Answers
4
Accepted Answer

S3 is an object storage, it's not a file system. So you'll have to download the files first from S3, zip them and then upload again back to s3, if you don't do in-memory operation. Once you verify that zip upload is successful for all the objects, you can consider archiving/deleting the objects based on data criticality.

You can either do it through an EC2 instance or lambda function or may be your local machine(through CLI). With lambda there is a limitation of 900 seconds max time. Consider local machine for downloading/uploading as last option as that may add data transfer cost.

Hope this explanation helps and provide you a direction for how to move forward.

Comment here if you have additional questions.

Happy to help.

Abhishek

profile pictureAWS
EXPERT
answered 8 months ago
profile pictureAWS
EXPERT
iBehr
reviewed 8 months ago
1

To convert all files in your s3 bucket into one single zip file you can use use AWS Lambda (Python) with the AWS SDK for Python (Boto3).

  • The below is code to convert all content of bucket into one single zip file
import boto3
import zipfile
import io

s3 = boto3.client('s3')

def lambda_handler(event, context):
    source_bucket = 'your-source-bucket'
    target_bucket = 'your-target-bucket'
   
    # List objects in the source bucket
    response = s3.list_objects_v2(Bucket=source_bucket)
   
    if 'Contents' in response:
        objects = response.get('Contents', [])
       
        for obj in objects:
            key = obj.get('Key')
            if key:
                # Get object content
                response = s3.get_object(Bucket=source_bucket, Key=key)
                content = response['Body'].read()
               
                # Create a zip file in-memory
                zip_buffer = io.BytesIO()
                with zipfile.ZipFile(zip_buffer, 'w', zipfile.ZIP_DEFLATED) as zipf:
                    zipf.writestr(key, content)
               
                # Upload the zip file to the target bucket
                zip_buffer.seek(0)
                target_key = key + '.zip'
                s3.upload_fileobj(zip_buffer, target_bucket, target_key)
    else:
        print("No objects found in the source bucket.")

  • To covert each object into a zip file you can use this code. (If you have 10 objects, you will have 10 zip files)
import boto3
import zipfile
import io

s3 = boto3.client('s3')

def lambda_handler(event, context):
    source_bucket = 'your-source-bucket'
    target_bucket = 'your-target-bucket'
    zip_file_name = 'all_files.zip'  # Name of the zip file
    
    # List objects in the source bucket
    response = s3.list_objects_v2(Bucket=source_bucket)
   
    if 'Contents' in response:
        objects = response.get('Contents', [])
       
        # Create a zip file in-memory
        zip_buffer = io.BytesIO()
        with zipfile.ZipFile(zip_buffer, 'w', zipfile.ZIP_DEFLATED) as zipf:
            for obj in objects:
                key = obj.get('Key')
                if key:
                    # Get object content
                    response = s3.get_object(Bucket=source_bucket, Key=key)
                    content = response['Body'].read()
                    
                    # Write the file to the zip archive
                    zipf.writestr(key, content)
            
        # Upload the zip file to the target bucket
        zip_buffer.seek(0)
        s3.upload_fileobj(zip_buffer, target_bucket, zip_file_name)
    else:
        print("No objects found in the source bucket.")

answered 8 months ago
profile picture
EXPERT
reviewed 24 days ago
    1. did you get the descriptor of the script backward on each one? the top one looks like individual files, the bottom looks like a single zip, based on the variable requesting the name of the zip in the bottom one and based on using target_key = key + '.zip' in the top.

    2. if my bucket contained a collection of folders, would each folder be zipped into a separate zip file in the target bucket? or would it recursively go through each folder and make a thousand little zips?

    Thanks.

  • I used your script to zip up select folders. The script only did a single folder path inside of the target folder. Such that, if folderA contains six subfolders, only the first subfolder will be copied to the zip. It does give me a starting point, though. going to try something with shutil.make_archive

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions