How do I troubleshoot errors that I receive when I trigger an Amazon EMR step with Amazon MWAA?
I want to troubleshoot errors that I receive when I trigger an Amazon EMR step with Amazon Managed Workflows for Apache Airflow (Amazon MWAA).
Short description
When you trigger an Amazon EMR step with Amazon MWAA, you might receive the following errors:
"An error occurred (AccessDeniedException) when calling the DescribeCluster operation"
"An error occurred (AccessDeniedException) when calling the AddJobFlowSteps operation"
"EMR endpoint is not reachable: botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL"
"An error occurred (InvalidRequestException) when calling the DescribeCluster operation: Cluster id is not valid."
"An error occurred (ValidationException) when calling the AddJobFlowSteps operation: Maximum number of active steps(State = 'Running', 'Pending' or 'Cancel_Pending') for cluster exceeded."
"An error occurred (ValidationException) when calling the AddJobFlowSteps operation: jobFlowId is not valid."
"An error occurred (ValidationException) when calling the AddJobFlowSteps operation: A job flow that is shutting down, terminated, or finished may not be modified."
"No module named 'airflow.providers.amazon.aws.operators.emr_add_steps'"
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshoot AWS CLI errors. Also, make sure that you're using the most recent AWS CLI version.
AccessDeniedException Error occurred during an Amazon EMR API call
These errors occur when the Amazon MWAA runtime role has permission issues that are related to Amazon EMR actions. To resolve these errors, make sure that the Amazon MWAA runtime role has DescribeCluster and AddJobFlowSteps permissions.
Example:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "EMRStepPermissions", "Effect": "Allow", "Action": [ "elasticmapreduce:DescribeCluster", "elasticmapreduce:AddJobFlowSteps" ], "Resource": "arn:aws:elasticmapreduce:example-region:example-account-id:cluster/example-cluster-id" } ] }
Note: Replace example-region with your AWS Region, example-account-id with your account ID, and example-cluster-id with your cluster ID.
EMR endpoint is not reachable: botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL
This error occurs when the Amazon MWAA environment can't reach the Amazon EMR cluster because of a networking misconfiguration. To resolve this error, make sure that the Amazon MWAA Amazon Virtual Private Cloud (Amazon VPC) configuration allows outbound traffic to the cluster.
For public routing, make sure that the security groups and subnet have a rule and route to allow outbound traffic to the cluster. For private routing, make sure that an Amazon EMR endpoint is associated with the Amazon MWAA environment subnets and security group.
If Amazon EMR is in another Amazon VPC, then use Amazon VPC peering to establish proper network connectivity.
An error occurred (InvalidRequestException) when calling the DescribeCluster operation: Cluster id is not valid
To resolve this error, confirm that the cluster ID is present in the same Region and account that the Amazon MWAA environment is configured for. Also, to allow AWS Identity and Access Management (IAM) users to check the cluster details, set VisibleToAllUsers to True in the JOB_FLOW_OVERRIDES parameter. For more information, see Request parameters.
To check the permissions that are required for the Amazon MWAA runtime role to list the cluster, run the describe-cluster command.
To view further details about your Amazon EMR cluster, see View Amazon EMR cluster status and details.
An error occurred (ValidationException) when calling the AddJobFlowSteps operation: jobFlowId <'some-string'> is not valid
This error occurs when the incorrect job_flow_id is passed in EmrAddStepsOperator. To resolve this, make sure that you use the Amazon EMR cluster ID as the job_flow_id. For more information, see Parameters in the Apache Airflow documentation.
An error occurred (ValidationException) when calling the AddJobFlowSteps operation: A job flow that is shutting down, terminated, or finished may not be modified
This error occurs when the AddJobFlowSteps operation can't be performed because the Amazon EMR job flow is shut down, terminated, or finished. To troubleshoot this error, use AWS CloudTrail to check when the cluster was terminated and when the EmrAddStepsOperator task started in the task logs. If the command to terminate the cluster was triggered before the task to start the cluster, then the error occurs.
To resolve this error, add KeepJobFlowAliveWhenNoSteps as True in the JOB_FLOW_OVERRIDES parameter of the DAG code. This setting transitions the cluster to a Waiting state after steps are completed.
Example:
JOB_FLOW_OVERRIDES = { "Name": "Data-Pipeline-" + execution_date, "ReleaseLabel": "emr-5.29.0", "LogUri": "s3://{}/logs/emr/".format(S3_BUCKET_NAME), "Instances": { "InstanceGroups": [ { "Name": "Master nodes", "Market": "ON_DEMAND", "InstanceRole": "MASTER", "InstanceType": "m5.xlarge", "InstanceCount": 1 }, { "Name": "Slave nodes", "Market": "ON_DEMAND", "InstanceRole": "CORE", "InstanceType": "m5.xlarge", "InstanceCount": 2 } ], "TerminationProtected": False, "KeepJobFlowAliveWhenNoSteps": True } }
To terminate the cluster after steps are completed, use one of the following methods:
Use EmrTerminateJobFlowOperator to create a task as the last task in the DAG that terminates the cluster when steps are completed:
cluster_remover = EmrTerminateJobFlowOperator( task_id='remove_cluster', job_flow_id=cluster_creator.output, aws_conn_id='aws_default', )
Or, set an auto-termination policy in JOB_FLOW_OVERRIDES that terminates the cluster after a specified idle time. The following example configuration terminates the cluster after 60 minutes of idle time:
JOB_FLOW_OVERRIDES = { "Name": "Data-Pipeline-" + execution_date, "ReleaseLabel": "emr-<version>", "LogUri": "s3://{}/logs/emr/".format(example-s3-bucket-name), "Instances": { "InstanceGroups": [ { "Name": "Master nodes", "Market": "ON_DEMAND", "InstanceRole": "MASTER", "InstanceType": "m5.xlarge", "InstanceCount": 1 }, { "Name": "Slave nodes", "Market": "ON_DEMAND", "InstanceRole": "CORE", "InstanceType": "m5.xlarge", "InstanceCount": 2 } ], "TerminationProtected": False, "KeepJobFlowAliveWhenNoSteps": True }, "AutoTerminationPolicy": {"IdleTimeout": 3600} }
For more information, see Defining tasks on the Apache Airflow website.
To view other JOB_FLOW_OVERRIDES, see run_job_flow.
For more information on Amazon EMR auto-termination, see Using an auto-termination policy for Amazon EMR cluster cleanup.
"No module named 'airflow.providers.amazon.aws.operators.emr_add_steps'"
If you need your Amazon MWAA environment and Amazon EMR cluster to be in different Regions, then create another connection ID. Make sure that you use the AWS Region that you want to launch the cluster in. If you want to change your default Region, complete the following steps:
- Open the Apache Airflow UI.
- From the top navigation pane, choose Admin, choose Connections, and then choose the + button.
- Enter a name for the connection ID that you want to use in the DAG.
- Choose Amazon Web Service as Connection Type.
- In the text field, add the following JSON:
Note: Replace example-region with the Region that you want to use as the default Region.{"region_name": "example-region"}
- Make sure that you use your new connection ID for each Amazon EMR operator that's used in Amazon MWAA.
DAG import errors
If you receive DAG import errors, then the required package isn't installed, the import path is different, or the operator name is incorrect. To troubleshoot these errors, confirm which apache-airflow-providers-amazon package version is installed in your Amazon MWAA environment. To do this, check the Airflow UI provider page. Or, see Apache Airflow provider packages installed on Amazon MWAA environments. Then, check the import path and operator name in the Airflow provider documentation based on the provider package version. For example, Amazon MWAA 2.2.2 has apache-airflow-providers-amazon 2.4.0. To check the import path and operator name for version 2.4.0, see airflow.providers.amazon.aws on the Apache Airflow website.
The following are import paths for each Amazon EMR operator:
- EmrAddStepsOperator: airflow.providers.amazon.aws.operators.emr_add_steps
- EmrCreateJobFlowOperator: airflow.providers.amazon.aws.operators.emr_create_job_flow
- EmrTerminateJobFlowOperator: airflow.providers.amazon.aws.operators.emr_terminate_job_flow
Note: For Amazon MWAA 2.8.1 that has apache-airflow-providers-amazon 8.16.0, all Amazon EMR operators are moved to a common Amazon EMR class. For more information, see airflow.providers.amazon.aws.operators.emr on the Apache Airflow website.
To include a path for any Amazon EMR operator, import the path from airflow.providers.amazon.aws.operators.emr:
from airflow.providers.amazon.aws.operators.emr import EmrAddStepsOperator, EmrCreateJobFlowOperator, EmrTerminateJobFlowOperator
Note: Make sure you modify the DAG import statements based on your use case.
相關內容
- 已提問 1 個月前lg...
- 已提問 2 年前lg...
- 已提問 1 年前lg...
- AWS 官方已更新 2 個月前
- AWS 官方已更新 2 年前
- AWS 官方已更新 1 年前
- AWS 官方已更新 2 年前