I follow this blog to try the hudi connect: Ingest streaming data to Apache Hudi tables using AWS Glue and Apache Hudi DeltaStreamer.
But when I started the glue job, I always got this error log:
2023-03-28 12:39:33,136 - __main__ - INFO - Glue ETL Marketplace - Preparing layer url and gz file path to store layer 8de5b65bd171294b1e04e0df439f4ea11ce923b642eddf3b3d76d297bfd2670c.
2023-03-28 12:39:33,136 - __main__ - INFO - Glue ETL Marketplace - Getting the layer file 8de5b65bd171294b1e04e0df439f4ea11ce923b642eddf3b3d76d297bfd2670c and store it as gz.
Traceback (most recent call last):
File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 361, in <module>
main()
File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 351, in main
res += download_jars_per_connection(conn, region, endpoint, proxy)
File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 304, in download_jars_per_connection
download_and_unpack_docker_layer(ecr_url, layer["digest"], dir_prefix, http_header)
File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 168, in download_and_unpack_docker_layer
layer = send_get_request(layer_url, header)
File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 80, in send_get_request
response.raise_for_status()
File "/home/spark/.local/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://prod-us-east-1-starport-layer-bucket.s3.us-east-1.amazonaws.com/6a636e-709825985650-a6bdf6d5-eba8-e643-536c-26147c8be5f0/84e9f346-bf80-4532-ac33-b00f5dbfa546?X-Amz-Security-Token=....Ks4HlEAQcC0PUIFipDGrNhcEAVTZQ%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20230328T123933Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3600&X-Amz-Credential=%2F20230328%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=c28f35ab3b3c
Glue ETL Marketplace - failed to download connector, activation script exited with code 1
LAUNCH ERROR | Glue ETL Marketplace - failed to download connector.Please refer logs for details.
Exception in thread "main"
java.lang.Exception: Glue ETL Marketplace - failed to download connector.
at com.amazonaws.services.glue.PrepareLaunch.downloadConnectorJar(PrepareLaunch.scala:1043)
at com.amazonaws.services.glue.PrepareLaunch.com$amazonaws$services$glue$PrepareLaunch$$prepareCmd(PrepareLaunch.scala:759)
at com.amazonaws.services.glue.PrepareLaunch$.main(PrepareLaunch.scala:42)
at com.amazonaws.services.glue.PrepareLaunch.main(PrepareLaunch.scala)
I guess the root cause is:
- The Glue job cannot pull the connect image from AWS maketplace.
- The connector image cannot store into the S3 bucket.
So I try these methods:
- Give permissions to the IAM role of the job. I give
AWSMarketplaceFullAccess, AmazonEC2ContainerRegistryFullAccess, AmazonS3FullAccess
, I think these permissions are enough definitely.
- Make the S3 bucket public. I turned off the
Block public access
of the related S3 bucket.
But even I did these, I still got the same error. Can someone give any suggestions?
Thank you for your prompt reply! This problem has been successfully resolved!