Issue in accessing latest generated file in S3 bucket

0

Hi,

I am generating a file in S3 every hour through a Lambda function and then after 10 minutes call a second lambda function processLatestS3File(). The second lambda function is called 10 minutes after the first one and the first one generates the file within a minute.

This code works as expected till about 1030 AM UTC each day and the second lambda function gets reference to the latest generated file. But post that, in subsequent calls for the day the second lambda function is always getting the 1030 AM UTC file when it tries to get the latest file in the bucket.

This was working perfectly fine for entire day few weeks back and I have made no changes to code. I have also checked and the first lambda function is working fine and is generating required files every hour even after 1030 AM UTC. I also checked the modified timestamp in S3 bucket and files generated after 1030 AM have more recent modified timestamps than the 1030 AM UTC file. Below is the code extract in processLatestS3File() to get the latest update file in bucket.

response = s3.list_objects_v2(Bucket=my_bucket_name)
sorted_objects = sorted(response['Contents'], key=lambda obj: obj['LastModified'], reverse=True)
latest_object = sorted_objects[0]
print(sorted_objects1[0]['Key'])
      

Any pointers on what could be causing this?

Regards, dbeings

dbeing
asked 5 months ago346 views
3 Answers
1
Accepted Answer

As mentioned in the Boto3 documentation that API will only return 1,000 objects. At that point you have to call again using the ContinuationToken parameter using the NextContinuationToken value received in the response.

Another option would be to use a paginator which makes your code a little simpler.

That said: Listing the contents of the bucket to find the latest object is not the most efficient way of doing that especially if there are many objects in the bucket. If it were me (and this depends highly on whether you just want the latest object or the last (say) ten latest objects): I would have a trigger for new objects in S3 that go to a Lambda function that stores the latest object into a DynamoDB table. Then you only have to query the table to get the latest object.

If you needed the latest 10 objects then the logic is a little more complex in the Lambda function but not particularly so. You might need to be careful if there are multiple uploads to S3 at the same time.

profile pictureAWS
EXPERT
answered 5 months ago
profile picture
EXPERT
reviewed a month ago
0

Thanks a lot Brettski-AWS. For now I have deleted the old objects and its working fine. In sometime, I will refactor the code to pull by the file name - need to standardize the generated file name by <prefix>_batch_time rather than <prefix>_update_time. But just two questions:

  1. In the sort objects, is there some way in which sort runs on entire bucket on last modified timestamp and the query returns just the first 1000. As in some approach where object fetch could be paginated but sort work on entire bucket?
  2. Just curious, why did it break only post a particular time each day.

Regards, dbeings

dbeing
answered 5 months ago
  • For (1) - yes, kind of.. Because you need to get all of the object names and sort on the client side. The sorting is not done server-side. For (2) - unsure, would need to see contents of bucket; debug output; etc. to see what was going on. Note that you can use the Prefix parameter in your call to list_objects_v2 to limit it to a particular prefix.

0

Instead of listing all files, you could use S3 Inventory: https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html, which provides a scheduled alternative to the Amazon S3 synchronous List API operations. Amazon S3 Inventory does not use the List API operations to audit your objects and does not affect the request rate of your bucket.

profile picture
answered 5 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions