How do I access cross-account resources with Airflow DAG on Amazon MWAA and troubleshoot related issues?

5 minutos de lectura
0

I want to access cross-account resources with Airflow DAG on Amazon Managed Workflows for Apache Airflow (Amazon MWAA) and troubleshoot related issues.

Resolution

The apache-airflow providers-amazon library is preinstalled in Amazon MWAA. This library offers a variety of AWS operators and helps manage tasks across AWS services. The following steps use AWS operators with AWS Identity and Access Management (IAM) and airflow connections for cross account access with Amazon MWAA. For more information, see Amazon Web Services Connection on the Apache Airflow website.

To use Apache Airflow to turn on cross-account resource access in Amazon MWAA, complete the following steps:

Note: The following steps invoke a cross-account AWS Glue job and require two AWS accounts. Account A is the source account and must include an established Amazon MWAA environment. Account B is the target account.

Create an IAM role in Account B for AWS Glue

Complete the following steps in Account B:

  1. Open the IAM console.
  2. Choose Roles, and then choose Create role.
  3. For Trusted entity type, choose Custom Trust Policy. Then, establish a trust relationship that allows the Amazon MWAA runtime role in Account A to assume the Account B role:
    Example trust policy:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::example-account-a-id:role/example-mwaa-runtime-role"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
    Note: Replace example-account-a-id with your account A ID and example-mwaa-runtime-role with the Amazon MWAA runtime role.
  4. For Permissions, grant glue:StartJobRun.
    Note: For full access to AWS Glue, attach the AWSGlueConsoleFullAccess managed policy.
    Example permission policy for the new GlueCrossAccount role:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "glue:StartJobRun",
            "glue:GetJob"
          ],
          "Resource": "arn:aws:glue:example-region:example-account-a-id:job/example-job"
        }
      ]
    }
    Note: Replace example-region with your AWS Region, example-account-a-id with your account A ID, and example-job with the job name.

Update the Amazon MWAA runtime role in Account A

Complete the following steps in Account A:

  1. Open the IAM console.
  2. Locate the Amazon MWAA runtime role. The role that's associated with the Amazon MWAA environment can be found on the environment details page > Permissions > Execution role.
  3. Choose Edit Policy. Then, add sts:AssumeRole for the role in Account B.
    Example policy:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "sts:AssumeRole",
          "Resource": "arn:aws:iam::example-account-b-id:role/example-glue-role"
        }
      ]
    }
    Note: Replace example-account-b-id with your Account B ID and example-glue-role with your AWS Glue role name.

Create an Airflow connection in Amazon MWAA

Complete the following steps in Account A:

  1. Access your Amazon MWAA environment's Airflow UI.
  2. Select Admin, and then choose Connections.
  3. To add a new connection, select Add a new record. Then, enter the following connection details:
    For Connection ID, assign a unique identifier for the connection.
    For Connection Type, choose Amazon Web Services from the dropdown list.
    For AWS Access Key ID and AWS Secret Access Key, leave these fields blank. You will use IAM roles for authentication.
    For Extra, input the role ARN and Region in JSON format.
    Example Extra field:
    { "role_arn": "arn:aws:iam::example-account-b:role/example-glue-role", "region_name": "example-region" }
    Note: Replace example-account-b-id with your Account B ID, example-glue-role with your AWS Glue role name, and example-region with your Region.
  4. Choose Save. Repeat the preceding steps 1 through 4 for each AWS account that you want to connect to.

Configure the GlueJobOperator in your Airflow DAG

To configure the GlueJobOperator in your DAG, use Airflow operators and specify the AWS connection ID for the AWS Glue job. For more information, see airflow.providers.amazon.aws.operators.glue on the Apache Airflow website.

Example DAG:

from airflow import DAG
from airflow.providers.amazon.aws.operators.glue import GlueJobOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2023, 1, 1),
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}

dag = DAG(
    'glue_cross_account',
    default_args=default_args,
    schedule_interval=None,
    catchup=False,
)

run_glue_job = GlueJobOperator(
    task_id='run_glue_job',
    job_name='example-glue-job',
    aws_conn_id='example-aws-connection-id',
    dag=dag,
)

Note: Replace example-glue-job with your AWS Glue job name and example-connection-id with your Airflow connection ID.

When the GlueJobOperator in your Airflow DAG runs, the role specified in the connection setup is assumed. This role has the required permissions to run AWS Glue jobs in Account B. Also, the Airflow task in account A triggers the AWS Glue job in Account B and uses the resources and data within Account B.

For more information, see AWS Glue on the Apache Airflow website.

Troubleshooting

If a task fails or doesn't trigger the AWS Glue job, take the following actions:

  • Check the Apache Airflow logs. These logs provide detailed information on what occurred during a task and help to identify issues such as IAM permission errors, connection problems, and configuration issues.
  • Check your IAM permissions. Make sure that the AWS Glue cross-account IAM role has the required permissions for AWS Glue operations. Also, make sure you have access to the required Amazon Simple Storage Service (Amazon S3) resources.
  • Check your Airflow connection. Make sure that AWS connection ID is correctly configured with access to assume the specified IAM role.
  • Check your script location. Make sure that the Amazon S3 path to your AWS Glue script is accessible and the script is correctly formatted.
  • For environments in a private Amazon Virtual Private Cloud (Amazon VPC) without internet access, check your networking configurations. Make sure that your network configurations such as transit gateways and Amazon VPC peering connections allow communication with AWS services.
  • Check for any resource based policy that is attached to a resource to allow cross account access.

Note: A known issue with Airflow 2.5.1 Amazon providers package (version 7.1.0) requires users to specify IAM roles or ARNs in the GlueJobOperator task. To resolve this issue, upgrade to version 8.2.0 or later of the Amazon provider package (the default version for Amazon MWAA 2.6.3). For more information, see GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version on the GitHub website.

Related information

Create a role

OFICIAL DE AWS
OFICIAL DE AWSActualizada hace 3 meses