What happens when MWAA is in a modifying state?

0

See title. I am wondering what the exact impact is, because the state lasts for a very long time, typically 20-30 minutes in my experience. For example, I'm wondering if dags scheduled to run in that interval still run.

asked 2 months ago447 views
2 Answers
1

This will come down to which parts of the MWAA environment that you are actually modifying. For example, if I add another scheduler to my MWAA environment and I already had two schedulers, the system will be in Updating status until that new scheduler is fully available. The Apache Airflow UI will be responsive and the other two schedulers will be operating any of the DAGs that is currently scheduled.

However, if you are doing a version update, like from version 2.7.2 to 2.8.1, then you may experience the environment to be unavailable. This type of action within the environment edit does show an alert.

Alert: When you select a new Apache Airflow minor version to upgrade your environment to, the update procedure can take up to 2 hours. Your environment will be unavailable while Amazon MWAA creates a backup of your data, updates Apache Airflow, and restores each component of the environment.

profile pictureAWS
answered 2 months ago
  • Thanks! Can you list out other actions that can cause unavailability? Or the opposite, which actions will not effect availability. In particular I am wondering about editing requirements, plugins, startup scripts, configuration options, and the environment class details, as those are changed somewhat frequently.

0

Hi,

When updating the environment after editing requirements, plugins, startup scripts, configuration options, and the environment class details, MWAA will normally replace the existing schedulers and workers, and reprovision new ones with the updated configuration. You could validate this by reviewing the new log-streams that get created within log-groups for Scheduler and Worker. After a few minutes of starting the update, new log-streams should start to arrive, using which you can also understand the timing of your requirements installation and startup of necessary processes within Schedulers and Workers.

If there is a scheduled dag-run when the environment is updating, it is possible for it to fail if its task instances land on a Worker that is just about to be replaced. Therefore, the recommendation is to only update the environment during a time window where you can have a downtime, and to reduce the update frequency as much as possible/feasible.

Regarding the time it takes to complete the update, it can vary based on what kind of customizations you are making. If it is much longer than a vanilla MWAA environment, you may need to closely review and possibly refactor your customizations such as requirements, plugins, startup scripts or configuration overrides.

I hope this helps! Feel free to ask if you've any follow up questions.

AWS
answered 2 months ago
  • If there is a scheduled dag-run when the environment is updating, it is possible for it to fail if its task instances land on a Worker that is just about to be replaced

    This matches the issue reported in https://technical.thombedford.com/267 However, in the comments the author says that AWS resolved the issue. "They updated recently saying the issue has been resolved"

    Can you clarify whether this issue is fixed or not? or is this a different issue?

  • AFAIK, the issue with auto-scaling isn't fully fixed yet, and is still being worked. I'm not certain if that fix will help to prevent task failure during environment update though. If feasible, I'd encourage you to raise a Support Case with AWS for an official answer from MWAA service team.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions