Skip to content

AWS Glue Workflow triggers, randomly triggering jobs

5

Hello everyone,

we are facing a major issue regarding AWS Glue's workflows. For some reason starting from today 01/02/2024, we have noticed that some ANY and ALL triggers will activate and trigger their assigned job multiple times, forcing runs to fail with Max concurrent runs exceeded. Moreover, some ANY triggers, seem to completely skip their intended target and trigger the last job of the Workflow, which by no means is target for said trigger.

This issue affects more than one workflows as well as more than one triggers. Please note that we have not performed any actions the past few days on those specific triggers or workflows.

Has anyone else faced a similar issue recently, do you have any idea what is going wrong? Is there any chance that Workflows are not supported by AWS? (I read something among these lines in StackOverflow)

  • We are facing the same problem. Triggers seem to start glue jobs multiple times. Any news from the AWS support?

  • If this is a sudden change in behavior as it sounds, please open a support ticket or provide the job worflow run id here so it can be investigated

asked 2 years ago217 views
1 Answer
1
Accepted Answer

Hello tmavrikis, is this still an issue? If so, hopefully some of the following recommendations and links can help answer your question. I'll break it down into the various questions and challenges you mentioned.

Workflow Support: I can assure you that AWS Glue workflows are fully supported by AWS. They are an integral part of AWS Glue and are designed to manage complex ETL activities. Recent Changes: It's important to note that even if you haven't made any changes, updates to the AWS Glue service itself could potentially impact workflow behavior.

To address the issues you're facing, let's consider a few potential causes and solutions:

Concurrency Limits:
    Review your workflow's concurrency settings. Ensure that the maximum number of concurrent workflow runs is set appropriately.
    Check the concurrency limits for individual jobs within the workflow.
    Consider adjusting these limits based on your expected workload.

Trigger Configuration:
    Review the configuration of your ANY and ALL triggers. Ensure they are set up correctly and are targeting the intended jobs.
    Check if there are any duplicate or conflicting trigger configurations.

Event Processing:
    If your workflows are triggered by events, ensure that there isn't a backlog of events causing multiple trigger activations.
    Review your event sources to see if there's been an unexpected increase in event frequency.

Logging and Monitoring:
    Enable detailed logging for your workflows and jobs.
    Use AWS CloudTrail to review API calls related to your Glue workflows.
    Set up CloudWatch alarms to alert you of unusual activity or exceeded thresholds.

Version Control:
    If you're using any form of version control or CI/CD for your Glue resources, check if there have been any recent deployments or changes.

Next Steps:

  1. Implement the above checks and adjustments as appropriate. If the issue persists, I recommend opening a case with AWS Support. They can provide more in-depth, account-specific assistance and escalate to the Glue service team if necessary.

  2. To verify if the problem has been resolved: Monitor your workflows closely after making any changes. Run test workflows with controlled inputs to ensure they behave as expected. Check that jobs are triggering in the correct order and only running once as intended.

  3. If you're still unsure about the root cause, consider the following additional steps: Temporarily simplify your workflows to isolate the issue. Create a test workflow that mimics the problematic behavior to see if you can reproduce it consistently. Review any recent changes in your data sources or job scripts that might be impacting workflow execution.

Remember, when making any changes to your AWS Glue resources, always test in a non-production environment first to ensure the changes have the desired effect without disrupting your production workloads. If you need more specific guidance or if these steps don't resolve the issue, please open a case with AWS Support.

Resources: https://docs.aws.amazon.com/glue/latest/webapi/API_StartWorkflowRun.html

https://docs.aws.amazon.com/glue/latest/webapi/API_ResumeWorkflowRun.html

https://repost.aws/questions/QUSR3S3B-OSReZsp9xSPPuhg/glue-queue-max-concurrent-runs-exceeded

AWS
answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.