Cost efficient way to move millions of files to another storage class

0

I want to move 200 million files from Standard to Glacier_IR storage classes. I know that I can use lifecycle rules and I'm planning to do that for any new files.

But for transitioning all the old files into Glacier_IR, assuming region=us-east-1, we'd have to pay $0.02 per 1000 files with lifecycle rules, which translates to $0.02/1000*200m = $4000.

Would using something like

aws s3 cp --recursive --storage-class GLACIER_IR s3://sample s3://sample

be more cost efficient? Would it cause us to be paying for COPY operations for Standard Class instead? That would be $0.005 per 1000 files, so 4 times cheaper: $0.005/1000*200m = $1000.

(Of course the aws s3 cp command is just an example, something slightly more sophisticated needs to be built using the AWS SDK that takes into account object size and age and allows us to continue the process from where we left it but in the end it'll essentially just use LIST and COPY operations so the cost calculations will be about the same as if using aws s3 cp.)

2 Answers
2
Accepted Answer

It won't matter whether you use lifecycle rules or COPY - the cost will be the same (although in the case of using the CLI, it might be more because you also have to pay for the LIST calls that the CLI initiates because of the --recursive flag).

When you do a COPY from S3 Standard to Glacier Instant Retrieval, you will still pay $0.02 per 1,000 requests according to the S3 pricing page. You will NOT be paying $0.005 per 1,000 requests (that's if you do a COPY into S3 Standard). There's no way to avoid paying for that $4,000 if you put it into Glacier Instant Retrieval because it costs AWS money to put it in that storage system on the backend.

The only other option is to consolidate those files using something like EMR s3-dist-cp, so instead of having 200 million you could potentially only have a few million objects. Then you can use lifecycle rules to easily transition those objects, and that would be much cheaper in terms of the transition fees.

AWS
Krishna
answered a month ago
profile picture
EXPERT
Steve_M
reviewed a month ago
  • Agree with this answer, the COPY command is going to be doing a PUT into glacier, so the glacier price is the one to use.

  • Thanks a ton, I didn't realize that I will be paying for the destination storage class's COPY price, not the source's, and in this case it matches exactly the price of the lifecycle transition cost.

0

Not a direct answer to your question, which is about the cost of COPY operations, but this Community Article pretty much describes exactly what it is you want to do https://repost.aws/articles/ARO4VRts2vRva3XVsbWrUyGw/optimizing-storage-costs-by-transitioning-millions-of-s3-objects-from-standard-to-glacier-tier

profile picture
EXPERT
Steve_M
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions