2 回答
- 最新
- 投票最多
- 评论最多
1
To upgrade AWS Glue 4.0 to use Hudi 0.14 follow the below steps
- Download the Spark Bundle jar for Hudi 0.14 - hudi-spark3.3-bundle_2.12-0.14.0.jar (you can download this from Maven repository
- Download Spark Avro JAR - spark-avro_2.13-3.3.0.jar (you can download this from Maven repository
- Upload the Jars to a S3 bucket.
- Go to the AWS Glue job, select a ETL Job and then go to Job Details
- Under Advanced Properties -> Libraries > Dependent JARs path box - enter the S3 URI to both the JARs , comma separated. e.g (s3://<bucket_name>/hudi/hudi-0.14-jars/hudi-spark3.3-bundle_2.12-0.14.0.jar,s3://<bucket>/ja_hudi/hudi-0.14-jars/spark-avro_2.13-3.3.0.jar)
- Under Job parameters , add a Key --extra-jars and Value as S3 URI to both the JARs , comma separated (same as the step above)
- (Optional) : If under Job Parameters, the job already has a Key --datalake-formats and value hudi, remove this property, since it will conflict with the HUDI jar that you are passing explicitly under Library section.
0
Hi,
You can follow the guidance of this very detailled video to upgrade Hudi's version: https://www.youtube.com/watch?v=HJ6QQN408AE
Best,
Didier
相关内容
- AWS 官方已更新 1 年前
