Glue作业无法下载hudi连接器:403 forbidden(IAM角色具有EC2ContainerRegistry和Marketplace的完全访问权限)

0

【以下的问题经过翻译处理】 我通过阅读这篇博客尝试使用Hudi连接:使用AWS Glue和Apache Hudi DeltaStreamer直接往Apache Hudi表中摄取流数据

但是,当我启动Glue作业时,我总是得到以下错误日志:

2023-03-28 12:39:33,136 - __main__ - INFO - Glue ETL Marketplace - Preparing layer url and gz file path to store layer 8de5b65bd171294b1e04e0df439f4ea11ce923b642eddf3b3d76d297bfd2670c.
2023-03-28 12:39:33,136 - __main__ - INFO - Glue ETL Marketplace - Getting the layer file 8de5b65bd171294b1e04e0df439f4ea11ce923b642eddf3b3d76d297bfd2670c and store it as gz.
Traceback (most recent call last):
  File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 361, in <module>
    main()
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 351, in main
    res += download_jars_per_connection(conn, region, endpoint, proxy)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 304, in download_jars_per_connection
    download_and_unpack_docker_layer(ecr_url, layer["digest"], dir_prefix, http_header)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 168, in download_and_unpack_docker_layer
    layer = send_get_request(layer_url, header)
  File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 80, in send_get_request
    response.raise_for_status()
  File "/home/spark/.local/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://prod-us-east-1-starport-layer-bucket.s3.us-east-1.amazonaws.com/6a636e-709825985650-a6bdf6d5-eba8-e643-536c-26147c8be5f0/84e9f346-bf80-4532-ac33-b00f5dbfa546?X-Amz-Security-Token=....Ks4HlEAQcC0PUIFipDGrNhcEAVTZQ%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20230328T123933Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3600&X-Amz-Credential=%2F20230328%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=c28f35ab3b3c
Glue ETL Marketplace - failed to download connector, activation script exited with code 1
LAUNCH ERROR | Glue ETL Marketplace - failed to download connector.Please refer logs for details.
Exception in thread "main" 
java.lang.Exception: Glue ETL Marketplace - failed to download connector.
	at com.amazonaws.services.glue.PrepareLaunch.downloadConnectorJar(PrepareLaunch.scala:1043)
	at com.amazonaws.services.glue.PrepareLaunch.com$amazonaws$services$glue$PrepareLaunch$$prepareCmd(PrepareLaunch.scala:759)
	at com.amazonaws.services.glue.PrepareLaunch$.main(PrepareLaunch.scala:42)
	at com.amazonaws.services.glue.PrepareLaunch.main(PrepareLaunch.scala)

我猜根本原因是:

  1. Glue Job 无法从Marketplace 上拉取 连接镜像
  2. 连接器镜像无法在S3中存储

所以 我尝试了以下方法:

  1. 给予Glue Job 执行角色赋予相应的角色. 我添加了AWSMarketplaceFullAccess, AmazonEC2ContainerRegistryFullAccess, AmazonS3FullAccess 这些权限, 且我相信这些权限肯定满足要求了
  2. 使S3 Bucket 变成可公共访问. 我关闭 了S3 的 Block public access 选项

但是, 即使我做了上面的操作, 我还是得到同样的错误. 有人可以给我点建议么?

profile picture
专家
已提问 5 个月前26 查看次数
1 回答
0

【以下的回答经过翻译处理】 如果使用Glue 3或更高版本,现在最好的支持方式是只需添加参数--datalake-formats=hudi,而不依赖于市场连接器。请参考以下链接:https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-hudi.html

profile picture
专家
已回答 5 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则