Glue job cannot download the hudi connector: 403 forbidden (IAM role has full access of EC2ContainerRegistry and Marketplace)

0

I follow this blog to try the hudi connect: Ingest streaming data to Apache Hudi tables using AWS Glue and Apache Hudi DeltaStreamer.

But when I started the glue job, I always got this error log:

2023-03-28 12:39:33,136 - __main__ - INFO - Glue ETL Marketplace - Preparing layer url and gz file path to store layer 8de5b65bd171294b1e04e0df439f4ea11ce923b642eddf3b3d76d297bfd2670c.
2023-03-28 12:39:33,136 - __main__ - INFO - Glue ETL Marketplace - Getting the layer file 8de5b65bd171294b1e04e0df439f4ea11ce923b642eddf3b3d76d297bfd2670c and store it as gz.
Traceback (most recent call last):
  File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 361, in <module>
    main()
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 351, in main
    res += download_jars_per_connection(conn, region, endpoint, proxy)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 304, in download_jars_per_connection
    download_and_unpack_docker_layer(ecr_url, layer["digest"], dir_prefix, http_header)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 168, in download_and_unpack_docker_layer
    layer = send_get_request(layer_url, header)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 80, in send_get_request
    response.raise_for_status()
  File "/home/spark/.local/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://prod-us-east-1-starport-layer-bucket.s3.us-east-1.amazonaws.com/6a636e-709825985650-a6bdf6d5-eba8-e643-536c-26147c8be5f0/84e9f346-bf80-4532-ac33-b00f5dbfa546?X-Amz-Security-Token=....Ks4HlEAQcC0PUIFipDGrNhcEAVTZQ%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20230328T123933Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3600&X-Amz-Credential=%2F20230328%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=c28f35ab3b3c
Glue ETL Marketplace - failed to download connector, activation script exited with code 1
LAUNCH ERROR | Glue ETL Marketplace - failed to download connector.Please refer logs for details.
Exception in thread "main" 
java.lang.Exception: Glue ETL Marketplace - failed to download connector.
	at com.amazonaws.services.glue.PrepareLaunch.downloadConnectorJar(PrepareLaunch.scala:1043)
	at com.amazonaws.services.glue.PrepareLaunch.com$amazonaws$services$glue$PrepareLaunch$$prepareCmd(PrepareLaunch.scala:759)
	at com.amazonaws.services.glue.PrepareLaunch$.main(PrepareLaunch.scala:42)
	at com.amazonaws.services.glue.PrepareLaunch.main(PrepareLaunch.scala)

I guess the root cause is:

  1. The Glue job cannot pull the connect image from AWS maketplace.
  2. The connector image cannot store into the S3 bucket.

So I try these methods:

  1. Give permissions to the IAM role of the job. I give AWSMarketplaceFullAccess, AmazonEC2ContainerRegistryFullAccess, AmazonS3FullAccess, I think these permissions are enough definitely.
  2. Make the S3 bucket public. I turned off the Block public access of the related S3 bucket.

But even I did these, I still got the same error. Can someone give any suggestions?

donglai
asked a year ago432 views
1 Answer
0
Accepted Answer

If using Glue 3 or later, nowadays, the best way to add support is just adding a parameter --datalake-formats=hudi and not depend on the marketplace connector
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-hudi.html

profile pictureAWS
EXPERT
answered a year ago
profile picture
EXPERT
A_J
reviewed 9 days ago
  • Thank you for your prompt reply! This problem has been successfully resolved!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions