Continue S3 file upload from where it crashed

0

Hey

I ran this command to copy a huge file to S3:

aws s3 cp C:\path\to\file s3://my-bucket/ --storage-class DEEP_ARCHIVE --profile s3bucket --output text --debug --cli-connect-timeout 0

It's been running OK for a while, but then eventually crashed with:

2022-09-06 14:00:01,477 - ThreadPoolExecutor-0_0 - s3transfer.utils - DEBUG - Releasing acquire 0/None
upload failed: C:\path\to\file to s3://my-bucket/file Could not connect to the endpoint URL: "https://my-bucket.s3.us-west-2.amazonaws.com/file?uploadId=SoMeBiGiDeTc&partNumber=1670"
2022-09-06 14:00:01,481 - Thread-1 - awscli.customizations.s3.results - DEBUG - Shutdown request received in result processing thread, shutting down result thread.

Is there a way I can resume from where it got interrupted, without starting over?

--

If I run:

aws s3api list-multipart-uploads --bucket my-bucket --profile s3bucket --output json

I see the multipart there that crashed, but I don't know how to make it continue from there.

Thank you!

4 Answers
1
Accepted Answer

It depends on what exception caused your process to crash. Here's an article on setting up retries. I have the legacy method set up and I don't crash very often at all, but my internet does go out now and then. Let me know if this helps. Here's the article. I'm sure it applies to the aws cli, too.

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html

answered 3 months ago
  • Thank you very much! I added now:

    max_attempts = 20
    retry_mode = standard
    

    Hopefully this will resolve that :)

1

Also, since you're using multipart uploads, you might want to create a lifecycle run to save you some money with storage of failed multipart uploads. here's an article on that. https://aws.amazon.com/blogs/aws-cloud-financial-management/discovering-and-deleting-incomplete-multipart-uploads-to-lower-amazon-s3-costs/

answered 3 months ago
  • Thank you. I didn't know I could create a lifecycle run for that. However, I've been running aws s3api abort-multipart-upload manually for each of the zombie multipart uploads that can be found in aws s3api list-multipart-uploads :-) Thank you very much!

1

One more thing I should add. If you've been using multipart uploads in the past and you implement that lifecycle rule mentioned in my post, your costs per day may drop dramatically. Mine went from $2.50 per day to $0.31 per day. Open a case with AWS and tell them you were overcharged for incomplete multipart uploads that failed. They will ask you how far back you were using that, so be prepared to tell them that. You may get a refund going back that far. Good luck. I got a refund for $340 that way.

answered 3 months ago
0

Hi There

You can also try a managed service like DataSync for migrating large datasets. DataSync takes care of the tuning and retries for you. Using the S3 CLI for large transfers can require a fair bit of system tuning. You’ll have to do some testing with the number of Amazon S3 concurrent threads, part size, possibly TCP window size, and likely parallel invocations of the S3 AWS CLI to match the throughput of AWS DataSync Take a look at this blog post for some other transfer methods you can try

https://aws.amazon.com/blogs/storage/migrating-and-managing-large-datasets-on-amazon-s3/

profile picture
EXPERT
Matt-B
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions