trigger lambda only if 2 files are placed in S3 path

0

HI,

I need to trigger lambda only if 2 files are placed in specific bucket path. Lambda should only if 2 files(test1.csv and test.csv files are placed).Can you please suggest what are the options.

I will create event bridge rule but i need to have condition for 2 files.Also is wait time in lambda is better option?

S3 Path s3://xyz/test1.csv s3://xyz/test2.csv

asked 2 months ago207 views
5 Answers
1

In AWS, there isn't a direct way to trigger a Lambda function based on the presence of multiple files. Lambda functions can be triggered by single S3 events, such as the upload of a single file, but they do not natively support waiting for multiple files to be uploaded before triggering.

To solve this problem, you need to implement a mechanism that tracks the uploads of these two files and triggers the Lambda function only when both files are confirmed to be in the S3 bucket.

Also is wait time in lambda is better option?

Using a wait time in a Lambda function is generally not a recommended approach for this scenario. Here's why:

  • Cost (billed by execution time)
  • Timeout
  • Inefficiency

Possible solution

You might want to consider using some services to help you achieve this, using S3 Event Notifications, DynamoDB Tracking, and a Step Function with some Lambdas:

  1. First, configure S3 event notifications to call a Lambda function (let's call it Lambda A) whenever a new file is uploaded to your specified S3 path.

  2. Set up a DynamoDB table to keep tabs on the uploaded files. Each entry in the table could include details like the file name, its upload status, and a timestamp.

  3. Whenever a new file lands in your S3 bucket, Lambda A springs into action. It checks if the new file is one of the ones we're looking for (test1.csv or test2.csv). If it is, Lambda A updates its record in the DynamoDB table to show that it's been uploaded.

  4. Use AWS Step Functions to manage the whole process. After Lambda A updates the DynamoDB record, it can kick off a Step Function execution. This Step Function then checks the DynamoDB table to see if both files have been uploaded.

  5. If the Step Function finds that both files are uploaded, it triggers another Lambda function (let's call it Lambda B) to start processing the files. If not, the Step Function ends, and nothing else happens.

  6. Once the files are processed, you can use Lambda B or another Lambda function to update the DynamoDB table, resetting the file statuses. This gets everything ready for the next time files are uploaded.

profile picture
EXPERT
answered 2 months ago
0

@Uri Files will be placed in 5mins timespan so stil looking for files in s3 bucket from lambda code is bad option?

answered 2 months ago
  • Still not. You are paying for idle time and you are paying for extra S3 API calls.

0

There is no direct way of doing this. You will need to create your own solution. For instance, generate an event for each file uploaded, trigger a function on that event. The function saves in DynamoDB that fact that a file was uploaded and checks if the other file was uploaded or not. If it was, it performs the work. If not, it does nothing else.

Waiting in a function is a bad practice. First, it may be that you need to wait longer than the function runtime, second, you need to constantly poll S3 to find out if the second file was uploaded (you can't get notified in the same function when the second file is uploaded) and third, you add to the cost as in Lambda you pay for duration.

profile pictureAWS
EXPERT
Uri
answered 2 months ago
0

Any mechanism to tracks the uploads of these two files and triggers the Lambda function only when both files are confirmed to be in the S3 bucket?

answered 2 months ago
0

If the function must run only when both objects are present, setup two S3 Event notifications in the bucket (one for each object), that both call the same Lambda function:

Event notification #1

Prefix test1.csv

Event Type All object create events

Destination MyLambdaFunction

And

Event notification #2

Prefix test2.csv

Event Type All object create events

Destination MyLambdaFunction

At the start of MyLambdaFunction check that both objects exist. If they do then run the rest of the function, otherwise immediately exit.

import boto3

s3_client = boto3.client('s3')

def object_exists(s3_bucket_name, s3_object) -> bool:
    try:
        s3_client.head_object(Bucket=s3_bucket_name, Key=s3_object)
        return True
    except:
        return False

def lambda_handler(event, context):
    if object_exists('xyz', 'test1.csv') and object_exists('xyz', 'test2.csv'):
        print("Run the rest of the function in here.")
    else:
        print("Cannot run, both files not present.")
        exit()
profile picture
EXPERT
Steve_M
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions