S3: Efficient Way to List and Copy a Million-Object Collection

0

I am attempting to list all the objects in my million-object S3 bucket that were uploaded within 2 years of today, and then sync these to a bucket in a new account and region (for DR purposes). As a first step, I am using the following command:

aws s3api list-objects-v2 --bucket BUCKET_NAME --query 'Contents[?contains(LastModified, YYYY-MM-DD)].Key'

However, this command is taking days to run. Other than breaking this command down and scanning month-by-month over the two-year period, is there a more effective / efficient way to obtain the desired list of objects (and then sync them cross-account and cross-region)?

AWS
lovjim
已提問 5 個月前檢視次數 232 次
2 個答案
1

You can have a look at S3 Inventory : https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html

S3 Inventory is designed for efficient and scalable reporting on your S3 objects. It provides a scheduled CSV or ORC output of your objects, which you can query externally.

example query

aws s3api put-bucket-inventory-configuration --bucket BUCKET_NAME --id inventory-id \
  --inventory-configuration '{"Id": "inventory-id","Destination": {"S3BucketDestination": {"Bucket": "REPORT_BUCKET_NAME"}},"IncludedObjectVersions": "Current","Schedule": {"Frequency": "Daily"},"Format": "CSV","Fields": ["Size","LastModifiedDate","ETag","StorageClass","IsMultipartUploaded","ReplicationStatus"]}'
profile picture
已回答 5 個月前
0

Once the inventory is gathered (I've used the above solution for large S3 object quantities), have you considered using Cross-Region Replication with RTC?

https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication-walkthrough-2.html (cross account/region) https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication-walkthrough-5.html (RTC).

AWS
KAS
已回答 5 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南