Hi,
First of all apologies for this simple question. I am new to AWS.
I have a job which run in EC2 machine ( scheduled using Cron) which writes 3 output files to a S3 bucket path. These 3 files all doesn't arrive at the same time. As soon as the files are arrived I need to run a glue job which will copy these files to a S3 bucket path which is in another account ( Only the Glue execution Role has been given access to the cross account S3 bucket and that is why I have to use Glue to transfer the files). As soon as each file is copied over they should be archived to a seperate folder in S3.
I was thinking of scheduling this Glue job to run every 1 hour and transfer the files and archive it once they are arrived. But then once the files are archived this job doesn't need to be run. So it looks like a waste of AWS reosurce if I schedule it to run every 1 hour.
How Can I trigger this Glue Job only based on the S3 file arrival? I see that using Lambda we can achieve this. Can I get details on this and what are the additional role that I need to set up for Lambda and the other details around creating the lambda trigger function? . As of now only the Glue role has the read/write access to both the S3 paths.
To add to that, ask eventbridge to group events, you don't want to trigger a Glue job for each file that arrives