AWS Glue Hudi Connector job with error 'hoodie only support org.apache.spark.serializer.KryoSerializer as spark.serializer'

0

I have subscribed to AWS Marketplace Glue Hudi Connector and added a Hudi Connector in Glue. I created a visual based Glue ETL job which reads data from S3 and merge the data to another S3 using Hudi connector. in the Hudi connector component, I added 'path' and 'hoodie.table.name' as connection options. But my job is failing with the error "hoodie only support org.apache.spark.serializer.KryoSerializer as spark.serializer". Can anyone help me on this?

gefragt vor 2 Jahren947 Aufrufe
1 Antwort
0

Hello,

For hudi you need to use KryoSerializer. Please use below code to add KryoSerializer using SparkSession :

import sys
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
from pyspark.sql.functions import col, to_timestamp, monotonically_increasing_id, to_date, when
from awsglue.utils import getResolvedOptions
from pyspark.sql.types import *
from datetime import datetime

args = getResolvedOptions(sys.argv, ['JOB_NAME','curated_bucket'])

spark = SparkSession.builder.config('spark.serializer','org.apache.spark.serializer.KryoSerializer').config('spark.sql.hive.convertMetastoreParquet','false').getOrCreate()
sc = spark.sparkContext
glueContext = GlueContext(sc)
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

--Reference:

[1] https://hudi.apache.org/docs/quick-start-guide/

AWS
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen