1 Answer
- Newest
- Most votes
- Most comments
0
Hey Dave,
It sounds like a perfect use case for Glue especially because the quantities and the concurrency is not too masive.
- Glue job accepts input values at runtime as parameters to be passed into the job. Parameters can be reliably passed into ETL script using AWS Glue’s getResolvedOptionsfunction.
- You can limit the number of running jobs by using the Data Processing Units (DPU) so it will run the number of jobs you need.
- For orchastrating more complex ETL graphs you can check out this documentation, it has a way of architecting it in a similar form of what you described in 6 ( https://docs.aws.amazon.com/glue/latest/dg/orchestrate-using-workflows.html ). In adittion you can concatinate job triggers and cloudwatch notifications like the following example: https://aws.amazon.com/premiumsupport/knowledge-center/start-glue-job-run-end/
Good luck! Ido
answered 5 years ago
Relevant content
- asked a year ago
- Accepted Answerasked 4 years ago
- How can I use a Lambda function to automatically start an AWS Glue job when a crawler run completes?AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 years ago
Hi Dave, Thank you for your response. I still have some doubts regarding the parameters you mentioned in the first point, and I believe you can assist me with that. Currently, I am using a Glue Workflow to process my data, which is triggered by an Event using EventBridge. This workflow starts with a Glue Job that fetches the last file uploaded to the S3 bucket being monitored, which triggers the event. However, I'm unsure how to adapt my Glue Job to get the exact file that triggered the workflow. I am concerned about how the workflow will handle multiple files being uploaded simultaneously.