Discrepancy in S3 Object count

0

Hello,

I have a S3 bucket with over 12 TB of data. The S3 metrics shows it has over 10 million objects. I had to list down all the objects in this bucket so I wrote a python code for it.

I ran the code 3 times, but the code stopped fetching a NextContinuation token after 3 million records, hence I cannot list all the objects in my bucket, I am only able to list 3 million records. I even tried running the code with StartAfter parameter using the Key from last record it fetched previously and still got no NextContinuation token. Here's my code:

============= Python Code =============
import boto3
import pandas as pd

client = boto3.client("s3")
continuation_token = None
record_count = 0

payload = dict(
Bucket='...',
MaxKeys=1000
)

while True:

**print("Fetching records...")**  

**if continuation_token and len(continuation_token) >= 0:**  
    **payload.update(**  
        **ContinuationToken=continuation_token**  
    **)**  

**response = client.list_objects_v2(**payload)**  

**if not response.get("NextContinuationToken") or continuation_token == response.get("NextContinuationToken") or not response.get("Contents"):**  
    **exit("Process Finished")**  

**# Dump to CSV**  
**pd.DataFrame(response.get("Contents")).to_csv("./s3_objects.csv", index=False, mode="a", header=False)**  

**# Update record count**  
**record_count = len(response.get("Contents"))**  
**print("Total records fetched", record_count, "\n")**  

**# Updating continuation token**  
**continuation_token = response.get("NextContinuationToken")**  
**print(continuation_token)**  

============= Python Code =============

Please help me understand and fix this problem.

Thankyou so much!

Update: Tried using awscli in Ubuntu 18.04 LTS
Command: aws s3 ls s3://... --recursive --summarize --human-readable > total_objects.txt
Result (At the end of file) :
Total Objects: 3036799
Total Size: 2.2 TiB

Note: I've obviously stripped down the original bucket name and replaced with "..."

Edited by: ISanV on May 5, 2021 8:06 AM

ISanV
gefragt vor 3 Jahren507 Aufrufe
1 Antwort
0

I discovered that Bucket Versioning was enabled which ramped up the costs about 6 times. The problem is solved now, I deleted all the previous versions using Bucket Lifecycle Rules and Suspended the Bucket Versioning.

This has been solved now.

ISanV
beantwortet vor 3 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen