Discrepancy in S3 Object count

0

Hello,

I have a S3 bucket with over 12 TB of data. The S3 metrics shows it has over 10 million objects. I had to list down all the objects in this bucket so I wrote a python code for it.

I ran the code 3 times, but the code stopped fetching a NextContinuation token after 3 million records, hence I cannot list all the objects in my bucket, I am only able to list 3 million records. I even tried running the code with StartAfter parameter using the Key from last record it fetched previously and still got no NextContinuation token. Here's my code:

============= Python Code =============
import boto3
import pandas as pd

client = boto3.client("s3")
continuation_token = None
record_count = 0

payload = dict(
Bucket='...',
MaxKeys=1000
)

while True:

**print("Fetching records...")**  

**if continuation_token and len(continuation_token) >= 0:**  
    **payload.update(**  
        **ContinuationToken=continuation_token**  
    **)**  

**response = client.list_objects_v2(**payload)**  

**if not response.get("NextContinuationToken") or continuation_token == response.get("NextContinuationToken") or not response.get("Contents"):**  
    **exit("Process Finished")**  

**# Dump to CSV**  
**pd.DataFrame(response.get("Contents")).to_csv("./s3_objects.csv", index=False, mode="a", header=False)**  

**# Update record count**  
**record_count = len(response.get("Contents"))**  
**print("Total records fetched", record_count, "\n")**  

**# Updating continuation token**  
**continuation_token = response.get("NextContinuationToken")**  
**print(continuation_token)**  

============= Python Code =============

Please help me understand and fix this problem.

Thankyou so much!

Update: Tried using awscli in Ubuntu 18.04 LTS
Command: aws s3 ls s3://... --recursive --summarize --human-readable > total_objects.txt
Result (At the end of file) :
Total Objects: 3036799
Total Size: 2.2 TiB

Note: I've obviously stripped down the original bucket name and replaced with "..."

Edited by: ISanV on May 5, 2021 8:06 AM

ISanV
asked 3 years ago494 views
1 Answer
0

I discovered that Bucket Versioning was enabled which ramped up the costs about 6 times. The problem is solved now, I deleted all the previous versions using Bucket Lifecycle Rules and Suspended the Bucket Versioning.

This has been solved now.

ISanV
answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions