By using AWS re:Post, you agree to the Terms of Use

AWS Glue Hudi Connector job with error 'hoodie only support org.apache.spark.serializer.KryoSerializer as spark.serializer'

0

I have subscribed to AWS Marketplace Glue Hudi Connector and added a Hudi Connector in Glue. I created a visual based Glue ETL job which reads data from S3 and merge the data to another S3 using Hudi connector. in the Hudi connector component, I added 'path' and 'hoodie.table.name' as connection options. But my job is failing with the error "hoodie only support org.apache.spark.serializer.KryoSerializer as spark.serializer". Can anyone help me on this?

asked 3 months ago58 views
1 Answer
0

Hello,

For hudi you need to use KryoSerializer. Please use below code to add KryoSerializer using SparkSession :

import sys
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
from pyspark.sql.functions import col, to_timestamp, monotonically_increasing_id, to_date, when
from awsglue.utils import getResolvedOptions
from pyspark.sql.types import *
from datetime import datetime

args = getResolvedOptions(sys.argv, ['JOB_NAME','curated_bucket'])

spark = SparkSession.builder.config('spark.serializer','org.apache.spark.serializer.KryoSerializer').config('spark.sql.hive.convertMetastoreParquet','false').getOrCreate()
sc = spark.sparkContext
glueContext = GlueContext(sc)
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

--Reference:

[1] https://hudi.apache.org/docs/quick-start-guide/

answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions