AWS Glue Hudi Connector job with error 'hoodie only support org.apache.spark.serializer.KryoSerializer as spark.serializer'

0

I have subscribed to AWS Marketplace Glue Hudi Connector and added a Hudi Connector in Glue. I created a visual based Glue ETL job which reads data from S3 and merge the data to another S3 using Hudi connector. in the Hudi connector component, I added 'path' and 'hoodie.table.name' as connection options. But my job is failing with the error "hoodie only support org.apache.spark.serializer.KryoSerializer as spark.serializer". Can anyone help me on this?

질문됨 2년 전947회 조회
1개 답변
0

Hello,

For hudi you need to use KryoSerializer. Please use below code to add KryoSerializer using SparkSession :

import sys
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
from pyspark.sql.functions import col, to_timestamp, monotonically_increasing_id, to_date, when
from awsglue.utils import getResolvedOptions
from pyspark.sql.types import *
from datetime import datetime

args = getResolvedOptions(sys.argv, ['JOB_NAME','curated_bucket'])

spark = SparkSession.builder.config('spark.serializer','org.apache.spark.serializer.KryoSerializer').config('spark.sql.hive.convertMetastoreParquet','false').getOrCreate()
sc = spark.sparkContext
glueContext = GlueContext(sc)
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

--Reference:

[1] https://hudi.apache.org/docs/quick-start-guide/

AWS
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠