How do I troubleshoot AWS Marketplace connection errors in my AWS Glue ETL jobs?

9 minute read
0

When I use AWS Marketplace connectors in AWS Glue, I receive errors in the logs.

Resolution

Connectors aren't showing

You subscribe to a connector from AWS Marketplace, but you can't find the connector in the AWS Glue Studio's connector page.

To resolve this issue, complete the following steps:

  1. Open the AWS Marketplace.
  2. Choose Discover products, and then find the connector that you want to use.
  3. Choose Continue to Subscribe, and then log in to your AWS account if prompted.
  4. Choose Continue to Configure. If this option is grayed out and you can't choose it, then make sure to read the Terms and Conditions. Choose Accept Terms, and then wait until the Continue to Configure button becomes available.
  5. From the dropdown list, choose Delivery Method and Software Version. If you're not sure which version to choose, then choose the latest version.
  6. Choose Continue to Launch, and then choose Usage Instruction.
  7. In the pop-up window, choose Activate the Glue connector from AWS Glue Studio.
  8. (Optional) To install only the connector, choose Activate connector only. For more information, see Using connectors and connections with AWS Glue Studio. If you use custom connectors instead, then see Developing custom connectors.

Note: You can repeat these steps even if you previously subscribed to the connector.

Issues with your IAM role

When you try to subscribe to a connector in the AWS Marketplace, you get an AWS Identity and Access Management (IAM) permissions error message similar to the following:

"You do not have the right permissions to make this request. Some controls have been disabled because you are missing the correct permission(s). The missing permission(s) are: aws-marketplace:Subscribe."

To resolve this issue, add an IAM policy to the IAM user that received the error. For AWS Marketplace, add the following IAM policies to your IAM user:

  • To grant permissions to view subscriptions but not change them, choose AWSMarketplaceRead-only.
  • To grant permissions to subscribe and unsubscribe, choose AWSMarketplaceManageSubscriptions.
  • To grant complete control of your subscriptions, choose AWSMarketplaceFullAccess.

For more information, see Controlling access to AWS Marketplace subscriptions.

AccessDeniedException errors

In the AWS Glue job's logs, you receive an AccessDeniedException error message similar to the following:

"An error occurred (AccessDeniedException) when calling the GetAuthorizationToken operation: User: arn:aws:sts::xxxxxxxxxxxx:assumed-role/<IamRole>/GlueJobRunnerSession is not authorized to perform: ecr:GetAuthorizationToken on resource: * because no identity-based policy allows the ecr:GetAuthorizationToken action
Glue ETL Marketplace - failed to download connector, activation script exited with code 1
LAUNCH ERROR | Glue ETL Marketplace - failed to download connector. Please refer logs for details."

This error occurs when the IAM role that's associated with your AWS Glue job has insufficient permissions when it performs the GetAuthorizationToken operation.

To resolve this issue, give your AWS Glue job the ecr:GetAuthorizationToken permission:

  1. Open the IAM console.
  2. Choose the IAM role that you use in the AWS Glue job.
  3. Choose Attach policies.
  4. Under Filter policies, enter AmazonEC2ContainerRegistryReadOnly, and then choose this policy.
  5. Choose Attach Policy.

After you attach the required policy to the IAM role, run the AWS Glue job again.

For more information, see AmazonEC2ContainerRegistryReadOnly, Adding IAM identity permissions (console), and Setting up IAM permissions for AWS Glue.

Networking issues - No network pathway from VPC

Your networking setup might not be adequate for AWS Glue connectors to correctly work when it's used in an AWS Glue job. You might get an error message similar to the following:

"botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://api.ecr.us-east-1.amazonaws.com/"Glue ETL Marketplace - failed to download connector, activation script exited with code 1
LAUNCH ERROR | Glue ETL Marketplace - failed to download connector. Please refer logs for details.
Exception in thread "main"
java.lang.Exception: Glue ETL Marketplace - failed to download connector."

The preceding example error message indicates that no network pathway from the virtual private cloud (VPC) contains the job's components to the Amazon Elastic Container Registry (Amazon ECR) repository. The Amazon ECR repository contains the images for the connectors. AWS Glue stores all connectors in an Amazon ECR repository in the us-east-1 AWS Region. If the AWS Glue job wants to use a connector, then it must download it from this Region.

When a connection is added to an AWS Glue job, you must establish a network route that allows traffic to flow to or from the service. AWS Glue uses private IP addresses to communicate with the components of the job and services, such as Amazon ECR. This error can occur if your connection uses a public subnet with an internet gateway in its route table. For more information, see Configuration for internet access.

When you create the connection, the networking information, such as VPC, subnet, and security group, are optional. If you create the connection with only the connector and an AWS Secrets Manager key, then the AWS Glue job uses an internal NAT gateway. The job doesn't rely on a NAT gateway in your account.

To resolve this issue, choose one of the following solutions, and incorporate it into your network design.

Create and attach a NAT gateway to the connection subnet

Don't use an internet gateway. Instead, create and attach a NAT gateway to the connection subnet:

  1. Allocate an unattached Elastic IP address to your account. Make sure that you associate the IP address with the NAT gateway.
  2. Create a NAT gateway, and then choose a public subnet and the Elastic IP address to create the NAT gateway in a public subnet.
  3. Create a private subnet without an internet gateway route and a related route table. In the route table, add a rule with 0.0.0.0/0 that points to the NAT gateway. Or, edit one of the existing subnets to use the route table with the NAT gateway route. Make sure that there's no internet gateway route that's used with the NAT gateway route.
  4. Revise the AWS Glue connection's subnet to use the private subnet.
  5. Run the AWS Glue job again to confirm that the error doesn't reoccur.

Don't use VPC information in the connection

Don't include VPC information in the connection. Instead, use an internet NAT gateway:

  1. Create a new connection for your connector in the AWS Glue Studio.
  2. Specify only the Secrets Manager key. Don't add any VPC options so that AWS Glue uses the internal NAT instead of the subnet.
  3. Edit the AWS Glue job to use the new connection, and then rerun the job.

For private network setups, create a VPC endpoint

Use a VPC endpoint with your private network setup instead of a NAT gateway. To use a VPC endpoint, complete the following steps.

Create VPC endpoint

First, create an Amazon ECR API endpoint. Next, create a VPC endpoint for the com.amazonaws.<region>.ecr.dkr service and then an Amazon Simple Storage Service (Amazon S3) endpoint.

Create the Amazon ECR API endpoint:

  1. Open the Amazon VPC console.
  2. From the navigation pane, choose Endpoints.
  3. Choose Create endpoint, and then add an endpoint name for your Amazon ECR API endpoint.
  4. For Service category, choose AWS services.
  5. For Services, add the ECR filter, and then choose com.amazonaws.<region>ecr.api.
  6. For VPC, select the VPC that you want to create the endpoint in. Under Additional settings, choose Enable DNS Name.
  7. For Subnets, choose the Availability Zone that you created the new subnet in.
  8. For Subnet ID, choose the Subnet name.
  9. For Security groups, choose your security group.
  10. For Policy, choose Full access to allow all operations by all principles on all resources over the VPC endpoint.
  11. (Optional) Add a tag.
  12. Choose Create endpoint.

Complete the same steps to create another VPC endpoint for the service name com.amazonaws.<region>.ecr.dkr.

Complete the following steps to create the Amazon S3 endpoint:

  1. Open the Amazon VPC console.
  2. From the navigation pane, choose Endpoints.
  3. Choose Create endpoint, and then add an endpoint name.
  4. For Service category, choose AWS services.
  5. For Services, add the Type:Gateway filter, and then choose com.amazonaws.<region>.s3.
  6. For VPC, choose the VPC that you want to create the endpoint in.
  7. For Route tables, choose your route tables.
  8. For Policy, choose Full access to allow all operations by all principles on all resources over the VPC endpoint.
  9. (Optional) Add a tag.
  10. Choose Create endpoint.

Subscribe to and configure connectors

If you already subscribed to and configured your connector in AWS Glue, then proceed to the Create AWS Glue connection section.

If you didn't subscribe to and configure your connector in AWS Glue, then follow the steps in Subscribing to AWS Marketplace connectors. In the Usage instruction pop-up window, choose Activate the Glue connector from AWS Glue Studio to take you to the Create Glue Connection page.

Create an AWS Glue connection

If you already added your connector in the AWS Glue console, then navigate to Connections and choose your connector. Then choose, Create connection.

If you followed the previous steps to subscribe to and configure connectors, then complete the following steps to create your connection:

  1. Open the AWS Glue console.
  2. On the Create Glue Connection page, add a Connection name.
  3. For Network options, choose your VPC and the subnet and security groups.
  4. Choose Create connection and activate connector.

Networking issues - too many connections in the AWS Glue job

If you encounter networking issues because your AWS Glue job has too many connections, then you receive the following error message in the job's logs:

"INFO - Glue ETL Marketplace - Start downloading connector jars for connection: <connection name>test connection feature: "Caused by: com.amazonaws.services.glue.exceptions.InvalidInputException: Connection: does not exist"
LAUNCH ERROR | Glue ETL Marketplace - failed to download connector. Please refer logs for details."

AWS Glue supports one connection per job or development endpoint. If you specify more than one connection in a job, then AWS Glue uses the first connection. If you must access more than one VPC, then see Connect to and run ETL jobs across multiple VPCs using a dedicated AWS Glue VPC.

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago