- Neueste
- Die meisten Stimmen
- Die meisten Kommentare
Hello ,
I Understand that you are trying to use glue to write a Hudi table however its failing with glue version 4.
I Referred the Aws Document to create a Hudi table from a sample data existing in s3 that has been catalogued [+] https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-hudi.html
Job Parameters provided: -->Key= --conf Value = spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.sql.hive.convertMetastoreParquet=false
-->Key= --datalake-formats Value = hudi
import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args)
AmazonS3_node = glueContext.create_dynamic_frame.from_catalog( database="<db_name_in_catalog>", table_name="<table_name_in_catalog>", transformation_ctx="AmazonS3_node", )
dataFrame=AmazonS3_node.toDF()
additional_options={ "hoodie.table.name": "<your_table_name>", "hoodie.datasource.write.storage.type": "COPY_ON_WRITE", "hoodie.datasource.write.operation": "upsert", "hoodie.datasource.write.recordkey.field": "<your_recordkey_field>", "hoodie.datasource.write.precombine.field": "<your_precombine_field>", "hoodie.datasource.write.partitionpath.field": "<your_partitionkey_field>", "hoodie.datasource.write.hive_style_partitioning": "true", "hoodie.datasource.hive_sync.enable": "true", "hoodie.datasource.hive_sync.database": "<your_database_name>", "hoodie.datasource.hive_sync.table": "<your_table_name>", "hoodie.datasource.hive_sync.partition_fields": "<your_partitionkey_field>", "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor", "hoodie.datasource.hive_sync.use_jdbc": "false", "hoodie.datasource.hive_sync.mode": "hms", "path": "s3://<s3Path/>" }
dataFrame.write.format("hudi")
.options(**additional_options)
.mode("overwrite")
.save()
job.commit()
-->Alternative Way
Additionally you can use Glue Visual studio to automatically generate a code for writing a Hudi table in s3
Thank You!
Hi, I am able to work with this solution. Thank you. Unfortunately the error is resolved , table structure is created but the records are not getting inserted in the hudi. I can not see any exception in the log too. Can you give me some hint?
Relevanter Inhalt
- AWS OFFICIALAktualisiert vor 2 Jahren
- AWS OFFICIALAktualisiert vor 2 Jahren
- AWS OFFICIALAktualisiert vor 2 Jahren