Python Boto3 S3 multipart upload in multiple threads doesn't work

0

Hello, I am trying to upload a 113 MB (119.244.077 byte) video to my bucket, it always takes 48 seconds, even if I use TransferConfig, it seems that multythread uploading does not work, any suggestions?

def upload_to_s3(file_name, bucket,path_s3):

    config = TransferConfig(multipart_threshold=1024 * 25,
                            max_concurrency=10,
                            multipart_chunksize=1024 * 25,
                            use_threads=True)
    try:
        start_time = time.time()
        _ = s3_client.upload_file(file_name, bucket, path_s3, Config=config)
        elapsed_time = time.time() - start_time
        print(f"Time:  {elapsed_time}")
    except ClientError as e:
        logging.error(e)
        return False

path_s3  = "something"
config = Config(connect_timeout=5,  retries={'max_attempts': 0},max_pool_connections=25) 
s3_client = boto3.client('s3', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, region_name=REGION_NAME, config=config)

# Set here the path of the file
path_file_to_upl = "./data/80e12098-ec85-59db-6e36-82e04e884439.mkv"

#Upload
upload_to_s3(path_file_to_upl, BUCKET_NAME,path_s3)

With the above code I take 48/49 seconds, if I set use_threads=False the time increases to 71 seconds

6 Answers
0
Accepted Answer

One way to check if the multipart upload is actually using multiple streams is to run a utility like tcpdump on the machine the transfer is running on. If multipart uploading is working you'll see more than one TCP connection to S3. If it isn't, you'll only see a single TCP connection.

Given that there is a speed difference (48 seconds vs 71 seconds) when you enable/disable multi-threading I think that multipart upload is working.

The main reason for using multipart upload is to better utilise the available bandwidth that you have because (in general, and I'm skipping a lot of detail) TCP doesn't use all of the bandwidth due to latency, data windows and so on. Even then, there will be a maximum transfer speed based on the bandwidth available.

Is it possible that you are using all of the bandwidth and that 48 seconds is the best time that is possible here? You haven't specified where you're uploading from (EC2 or somewhere external AWS); nor the latency to S3 so it's difficult to tell.

I assume that your question is "how can I make my transfer go faster" but without other information it's not possible to say.

Adding here because it's neater

As a measure of "good" I'd try using the AWS CLI to do the same transfer and see what time you get from that.

For now, if you can transfer 115 MB in about 48 seconds that puts your upload speed at about 25 Mb/s (assuming very low latency to the region). What is the bandwidth on your home network to the Internet?

profile pictureAWS
EXPERT
answered 2 years ago
0

I usually upload files that are about 130GB. It's possible your exception handling might be the problem, too. Here's the code I use:

        config = TransferConfig(multipart_threshold=1000 * mb, max_concurrency=int(psutil.cpu_count() / 2),
                                multipart_chunksize=1000 * mb, use_threads=True)
        metadata = {'BackupTS': str(tstamp), 'Backup_Info': backup_info}
        extra_args = {'Metadata': metadata}
        for item in get_files_in_dir(source, mod_time, logging_root):
            if item.strip() != '':
                copy_response = copy_to_s3(s3, item, destination, str(item)[2:].replace('\\\\', '\\'),
                                           extra_args, None, config, tstamp, logging_root)

def copy_to_s3(s3, source, destination, key, extra_args, callback, config, tstamp, logging_root): upload_file_response = {} try: upload_file_response = s3.upload_file(source, destination, key, ExtraArgs=extra_args, Callback=callback, Config=config)

except s3.exceptions.ClientError as err:
    generic_exception_handler("S3 Exception (upload file ClientError)", logging_root)
    raise err

except botocore.exceptions.ClientError as err:
    generic_exception_handler("botocore Exception (upload file ClientError)", logging_root)
    raise err
answered 2 years ago
  • I don't think it's a problem with the try, I tried the code even removing the try but nothing. The code looks the same to me, but for a 115 MB file it takes me 48 seconds. Could there be something I have to do at system level or configure the bucket on S3?

  • My point is that you specify except ClientError, when you should be saying either except s3_client.exceptions.ClientError or except botocore.exceptions.ClientError or both. Try that.

  • The problem is not exception

0

Why do you have the function defined as def upload_to_s3(file_name, bucket,path_s3):, but the call doesn't have three parameters.

I use python to upload to S3 regularly with multipart uploads and it seems to work just fine.

answered 2 years ago
  • Yes sorry the parameter is missing but that's not the problem, I copied and simplified my code just to mention the important parts, I defined another function because I do other things before loading the file. Did you also try with a 115MB file? My upload is very slow I'm connected via ethernet with 19 Mbps upload

  • My upload rate is about the same, if not a little lower. It can take more than a day to upload my large files.

0

Also, if you are using multipart uploads, you might want to set a lifecycle rule to delete failed multipart uploads. They are kept in your s3 bucket and you might be paying for them. If your costs go down after implementing this, you will also want to ask AWS for a refund of those costs. Open a case with them. They will want to know how long back to go in your account for the refund. It says to use 7 days for the rule, but I would use 1 day, otherwise you'll have to wait 7 days for it to take affect and you'll pay for that storage all that time, too.

https://aws.amazon.com/blogs/aws-cloud-financial-management/discovering-and-deleting-incomplete-multipart-uploads-to-lower-amazon-s3-costs/

answered 2 years ago
  • I don't have these problems now

0

My bandwidth is this: Enter image description here

If I use the CLI the time is same

answered 2 years ago
  • Looks like 20 Mb/s is the maximum you can get; so your upload speed you're achieving right now is maxing out your connection. No amount of optimisation is going to make it any better.

0

Yes my question is "how can I make my transfer go faster" , I am trying to upload videos of about 115 MB , I am using a classic PC connected to the home network. Yes it seems to me that whatever parameter I put in the method , e.g. threshold or number of threads, the time always remains 48 sec there seems to be something acting as a bottleneck. Do you have any advice?

-It seems too long to upload such a small file, I can tell you that I am uploading to an aws region that is the same as the one I am in geographically

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions