Continue S3 file upload from where it crashed

0

Hey

I ran this command to copy a huge file to S3:

aws s3 cp C:\path\to\file s3://my-bucket/ --storage-class DEEP_ARCHIVE --profile s3bucket --output text --debug --cli-connect-timeout 0

It's been running OK for a while, but then eventually crashed with:

2022-09-06 14:00:01,477 - ThreadPoolExecutor-0_0 - s3transfer.utils - DEBUG - Releasing acquire 0/None
upload failed: C:\path\to\file to s3://my-bucket/file Could not connect to the endpoint URL: "https://my-bucket.s3.us-west-2.amazonaws.com/file?uploadId=SoMeBiGiDeTc&partNumber=1670"
2022-09-06 14:00:01,481 - Thread-1 - awscli.customizations.s3.results - DEBUG - Shutdown request received in result processing thread, shutting down result thread.

Is there a way I can resume from where it got interrupted, without starting over?

--

If I run:

aws s3api list-multipart-uploads --bucket my-bucket --profile s3bucket --output json

I see the multipart there that crashed, but I don't know how to make it continue from there.

Thank you!

4回答
1
承認された回答

It depends on what exception caused your process to crash. Here's an article on setting up retries. I have the legacy method set up and I don't crash very often at all, but my internet does go out now and then. Let me know if this helps. Here's the article. I'm sure it applies to the aws cli, too.

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html

回答済み 2年前
  • Thank you very much! I added now:

    max_attempts = 20
    retry_mode = standard
    

    Hopefully this will resolve that :)

1

Also, since you're using multipart uploads, you might want to create a lifecycle run to save you some money with storage of failed multipart uploads. here's an article on that. https://aws.amazon.com/blogs/aws-cloud-financial-management/discovering-and-deleting-incomplete-multipart-uploads-to-lower-amazon-s3-costs/

回答済み 2年前
  • Thank you. I didn't know I could create a lifecycle run for that. However, I've been running aws s3api abort-multipart-upload manually for each of the zombie multipart uploads that can be found in aws s3api list-multipart-uploads :-) Thank you very much!

1

One more thing I should add. If you've been using multipart uploads in the past and you implement that lifecycle rule mentioned in my post, your costs per day may drop dramatically. Mine went from $2.50 per day to $0.31 per day. Open a case with AWS and tell them you were overcharged for incomplete multipart uploads that failed. They will ask you how far back you were using that, so be prepared to tell them that. You may get a refund going back that far. Good luck. I got a refund for $340 that way.

回答済み 2年前
0

Hi There

You can also try a managed service like DataSync for migrating large datasets. DataSync takes care of the tuning and retries for you. Using the S3 CLI for large transfers can require a fair bit of system tuning. You’ll have to do some testing with the number of Amazon S3 concurrent threads, part size, possibly TCP window size, and likely parallel invocations of the S3 AWS CLI to match the throughput of AWS DataSync Take a look at this blog post for some other transfer methods you can try

https://aws.amazon.com/blogs/storage/migrating-and-managing-large-datasets-on-amazon-s3/

profile pictureAWS
エキスパート
Matt-B
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ