- Newest
- Most votes
- Most comments
Hello,
You can use storage lens, S3 storage lens is a cloud storage analytics solution which provides single view object storage usage and activities across AWS accounts within an organization. It offers organization-wide visibility, actionable recommendations for cost optimisation and data protection best practices . Advance version includes features like prefix aggregation and 15 months of historical data analysis, along with many other features. Please have a look at the link below to more clarity.
Hope this helps !
Hello,
My understanding of your question is you want to find out if there is anyway other than GUI or CLI to get Size of your bucket and subfolders.
I would suggest you to use Boto3(Python SDK). You can write a simple script which can fetch you required results.
I can provide you a rough script: import boto3 from collections import defaultdict
s3 = boto3.client('s3') paginator = s3.get_paginator('list_objects')
subfolder_sizes = defaultdict(int)
try: for page in paginator.paginate(Bucket=bucket_name, Prefix=prefix): for obj in page.get('Contents', []): key = obj['Key'] # Extract subfolder path subfolder = '/'.join(key.split('/')[:-1]) subfolder_sizes[subfolder] += obj['Size'] print(subfolder_sizes) except Exception as e: print(f"Error calculating subfolder sizes: {e}")
Feel free to go through Boto3 documentation and get desired results: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html
I don't think the CLI is getting stuck. It's just asking S3 to list the objects matching your prefix filter, with each request returning data for a maximum of 1,000 objects. With a large number of objects, lots of requests are needed to loop through them all, and that takes time. The same 1,000-object limit per listing operation applies to list requests you'd make in your own code written in Python or another language. You can run multiple ListObjectsV2 calls in parallel to increase performance to some extent, but S3 enforces limits and starts to throttle you if you exceed them. You didn't mention how many objects your bucket contains, so I can't say how feasible parallelising the list operations would be.
The easiest thing you can do, if the number of objects isn't astronomically large, is to use the CLI as you have been doing, and allow it to take its time to finish processing all the objects. It may easily take many hours to process a huge number of objects.
The only approach that works for arbitrarily large numbers of objects is to use the S3 Inventory feature. It's a built-in capability of S3 which produces an object list on your behalf. S3 has internal mechanisms to parallelise the process across S3's underlying server fleet, allowing it to process a bucket of any size in a finite amount of time.
Setting up the inventory for the first time takes a bit of doing, though, so this isn't anywhere near as easy as just waiting for the CLI to finish. However, the inventory feature works for buckets of any size.
To get started with the inventory, you should create and configure a separate bucket that will receive the inventory reports. Then configure and schedule the S3 inventory report, selecting the fields that you want it to contain for each object. Set the report format to ORC, so that it can be analysed efficiently with Athena. You can set the report to be delivered at most once per day. It'll start at midnight UTC and may take several hours to finish. The concept of S3 Inventory and the detailed process for setting it up is explained in S3's documentation: https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html
Once the first S3 inventory report has been delivered, you can analyse it with Amazon Athena. If you haven't used Athena before, you'll first need to set up another S3 bucket to store the results of the queries you run with Athena. The process for setting up the result bucket is explained here: https://docs.aws.amazon.com/athena/latest/ug/querying.html#query-results-specify-location-console. This is also a one-time procedure per region.
Finally, create a table in the Athena database for the inventory. This teaches Athena the structure of the inventory reports and makes it possible to query them. The procedure is explained in detail here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory-athena-query.html
All the steps above only need to be done once. After they're done, S3 Inventory will automatically deliver a list of all the objects in your bucket once per day, and you can query the latest inventory report any time you like with a simple SQL query in Athena.
If your table is called my_bucket_inventory and you want to calculate the total size of all objects with the prefix folder1/ in the report for August 3rd, 2024, the query would be something like this:
select sum(cast(size as bigint)) as total_size
from my_bucket_inventory
where
dt='2024-08-03'
and starts_with(key, 'folder1/')
The query should only take some tens of seconds to complete, even for a huge number of objects.
You can use regular SQL to query the report in other ways, if you'd like. For example, you could search for objects by a part of their name (key) with a query like this, which looks for objects whose key contains the text "some-phrase", compared case-insensitively (because key is converted to lowercase before the comparison):
select *
from my_bucket_inventory
where
dt='2024-08-03'
and lower(key) like '%some-phrase%'
Relevant content
- asked 3 years ago
- asked 7 months ago
- asked 4 years ago

I created a powershell script that generates a csv with the sizes of all folders using the aws cli with ls recursively in powershell link: https://github.com/Pedro-Bat/Bucket_Size_AWSCLI.ps1