- Newest
- Most votes
- Most comments
As mentioned in the Boto3 documentation that API will only return 1,000 objects. At that point you have to call again using the ContinuationToken
parameter using the NextContinuationToken
value received in the response.
Another option would be to use a paginator which makes your code a little simpler.
That said: Listing the contents of the bucket to find the latest object is not the most efficient way of doing that especially if there are many objects in the bucket. If it were me (and this depends highly on whether you just want the latest object or the last (say) ten latest objects): I would have a trigger for new objects in S3 that go to a Lambda function that stores the latest object into a DynamoDB table. Then you only have to query the table to get the latest object.
If you needed the latest 10 objects then the logic is a little more complex in the Lambda function but not particularly so. You might need to be careful if there are multiple uploads to S3 at the same time.
Thanks a lot Brettski-AWS. For now I have deleted the old objects and its working fine. In sometime, I will refactor the code to pull by the file name - need to standardize the generated file name by <prefix>_batch_time rather than <prefix>_update_time. But just two questions:
- In the sort objects, is there some way in which sort runs on entire bucket on last modified timestamp and the query returns just the first 1000. As in some approach where object fetch could be paginated but sort work on entire bucket?
- Just curious, why did it break only post a particular time each day.
Regards, dbeings
Instead of listing all files, you could use S3 Inventory: https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html, which provides a scheduled alternative to the Amazon S3 synchronous List API operations. Amazon S3 Inventory does not use the List API operations to audit your objects and does not affect the request rate of your bucket.
Relevant content
- Accepted Answerasked 2 months ago
- asked 5 years ago
- asked 3 years ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 6 months ago
For (1) - yes, kind of.. Because you need to get all of the object names and sort on the client side. The sorting is not done server-side. For (2) - unsure, would need to see contents of bucket; debug output; etc. to see what was going on. Note that you can use the
Prefix
parameter in your call tolist_objects_v2
to limit it to a particular prefix.