Skip to content

Free way to get exact S3 storage usage by storage class (near real-time)

1

Hi Community,

I’m looking for a free method to determine how much storage is used in an Amazon S3 bucket and how much data exists in each storage class (Standard, Standard-IA, Glacier, Glacier IR, Deep Archive, etc.).

My requirements are: Get exact storage usage per storage class Preferably near real-time or frequently updated data Should work with large buckets containing millions of objects Prefer a free solution without additional AWS service costs So far I have explored a few options: CloudWatch metrics (BucketSizeBytes) Shows storage usage by storage type But metrics are updated only once per day S3 Inventory + Athena Provides accurate object metadata including storage class But inventory reports are generated daily AWS CLI listing (aws s3 ls --recursive) Gives total size but does not break down by storage class My question is: 👉 Is there any free AWS-native method to get accurate S3 storage usage per storage class, preferably with near real-time data, without scanning millions of objects manually?

If anyone has implemented a script, AWS CLI approach, or monitoring method to solve this efficiently, I’d appreciate your suggestions.

Thanks in advance. Manohar.

  • If my answer helped solve your problem, I would appreciate it if you click on “accepted answer”.

2 Answers
2

according to my understanding the recommendation to use S3 Storage Lens (Free Tier) is technically the best "out-of-the-box" free option, but as Manohar already noted, the 24-hour latency is the dealbreaker for "near real-time" needs.

Regarding "Glacier & Deep Archive Blind Spot"

One critical detail often missed is that objects moved to S3 Glacier or S3 Glacier Deep Archive are "archived." This means:

  • Standard Metadata Access: While you can see the object name in a LIST command, you cannot perform HeadObject requests to retrieve detailed metadata in real-time without initiating a restore process if you are trying to calculate sizes via certain scripts.
  • The S3 Inventory Advantage: S3 Inventory is the only reliable and cost-effective way to get metadata (like size and storage class) for millions of archived objects without incurring massive API overhead or restore costs. It provides a flat file (CSV, ORC, or Parquet) containing the metadata for every object in your bucket, including those in deep archive.

"Amazon S3 inventory provides comma-separated values (CSV), Apache Optimized Row Columnar (ORC) or Apache Parquet output files that list your objects and their corresponding metadata on a daily or weekly basis... S3 inventory is one of the most efficient ways to manage your storage, as it avoids the need to perform expensive synchronous List requests."

Source: https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html

So, S3 Inventory is the only "reliable" and "cost-effective" way to get metadata (like size and storage class) for millions of archived objects without incurring massive API overhead or restore costs. It provides a flat file (CSV, ORC, or Parquet) containing the metadata for every object in your bucket, including those in deep archive.

The fundamental issue is that S3 is an object store, not a file system. To give you a "real-time" sum of millions of objects, AWS would have to scan metadata constantly, which costs compute power. Since you want to avoid costs, here are two deeper perspectives:

1. The "Event-Driven" Workaround (Near Real-Time & Low Cost)

If you need to track changes as they happen without waiting 24 hours, you can build a simple monitoring pipeline. This is often covered by the AWS Free Tier:

Benefit: This gives you a dashboard with absolute real-time accuracy for all new activity. Note: You would need to run a one-time S3 Inventory report to get the "starting balance" for your existing millions of objects.

2. Why you should avoid CLI (ls --recursive)

For buckets with millions of objects, running a manual scan via CLI is actually NOT free. You will be charged for LIST requests (currently $0.005 per 1,000 requests). Scanning 10 million objects would cost roughly $50 per scan just in API fees. It is also extremely slow and could take hours to complete.

3. S3 Storage Lens Advanced (The "Almost Free" Alternative)

While you asked for a free solution, S3 Storage Lens Advanced Metrics provides more frequent updates and CloudWatch publishing.

  • It costs $0.20 per million objects per month.
  • If you have 5 million objects, for $1/month, you get significantly better visibility than the free tier, which might be a fair trade-off compared to the engineering effort of a custom Lambda solution.

If you strictly want 0 USD cost, you must accept the 24-hour delay of the standard Storage Lens. If you need "Near-Real-Time (NRT)", the "Event + Lambda + DynamoDB" approach is the most professional way to solve this within the Free Tier limits.

EXPERT
answered 2 months ago
1

Based on the available AWS-native methods, there isn't a completely free solution that provides near real-time storage usage breakdown by storage class for S3 buckets. However, here are your best free options:

Amazon S3 Storage Lens (Free Metrics) is your best free option. It provides 62 metrics across various categories at the bucket level, including storage class breakdowns. The free tier includes metrics for cost optimization, data protection, and access management. However, the data is available for queries for up to 14 days, and while it's updated regularly, it's not truly near real-time.

Amazon CloudWatch offers storage metrics including BucketSizeBytes by StorageType, which shows usage per storage class. This is free for S3 metrics, but as you've noted, CloudWatch records Amazon S3 metrics once each day, so it won't provide near real-time data.

For near real-time data, you would need to use S3 Storage Lens with Advanced Metrics, which includes activity metrics and more frequent updates. However, this comes with additional charges for per-object monitoring.

The fundamental limitation is that AWS doesn't offer a free, near real-time method for tracking storage usage by storage class. The daily update frequency in CloudWatch and S3 Storage Lens free metrics is the standard for free monitoring. For large buckets with millions of objects, manually scanning objects would be impractical and potentially costly due to API request charges.

Your best free approach would be to use S3 Storage Lens free metrics or CloudWatch metrics, accepting the daily update limitation, as these provide accurate storage class breakdowns without additional costs beyond standard S3 storage.
Sources
Amazon S3 Features – Storage Lens – AWS
Monitoring your storage activity and usage with Amazon S3 Storage Lens - Amazon Simple Storage Service
Calculate Amazon S3 bucket size and number of objects | AWS re:Post
Viewing metrics with Amazon S3 Storage Lens - Amazon Simple Storage Service

answered 2 months ago
EXPERT
reviewed 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.