AWS Glue Job Error : An error occurred while calling o115.getDynamicFrame.: java.lang.UnsupportedOperationException: empty.reduceLeft

0

Hello Friends, I am working on a project that reads data from Dremio data lakehouse solution, I am trying to read the data from one of its schema. Glue does not natively come with the connector, so I had to build a custom jdbc.

See my code base

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkConf, SparkContext
from pyspark.sql import SparkSession
from awsglue.context import GlueContext
from awsglue.job import Job
conf = SparkConf()

args = getResolvedOptions(sys.argv, ['JOB_NAME'])

conf.set("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")\
    .set("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog")\
    .set("spark.sql.catalog.glue_catalog.warehouse", "s3://dev-smt-data-cache/stageZone_iceberg/iceber_repo/smt-data/")\
    .set("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog")\
    .set("spark.sql.catalog.glue_catalog.io-impl","org.apache.iceberg.aws.s3.S3FileIO")\
    .set("--datalake-formats","iceberg")

sc = SparkContext(conf=conf)
glueContext = GlueContext(sc)

# below spark session will have the above configuration
spark = glueContext.spark_session   
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

DynamicFrame = glueContext.create_dynamic_frame.from_options(
        connection_type = "jdbc", 
        connection_options = {
        "query":""" 'SELECT FileID FROM "mp2-appsrvspace".ESG.CTG."USG_2000_Transaction" LIMIT 10' """,
        "inferSchema":True,
        # "dbtable": """ "mp2-appsrvspace".ESG.CTG."USG_2000_Transaction" """,
        "connectionName":"Dremio-Stage"}
         #"transformation_ctx" = "DynamicFrame"
         )

applyformat = ApplyMapping.apply(
    frame =DynamicFrame, 
    mappings =
        [("field1","string","field1","string")
        #("field2","string","field2","string") 
        ], 
    transformation_ctx = "applyformat"
    )      
dynamicFrame = DynamicFrame.toDF().createOrReplaceTempView("temp_table")
print(dynamicFrame.head(5))

I keep getting this error before , I have different approaches none working out

   raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o115.getDynamicFrame.
: java.lang.UnsupportedOperationException: empty.reduceLeft

I will appreciate some guide/hints me on how can I fix the problem?

SAM
已提问 1 个月前158 查看次数
2 回答
0

Hello,

Error 'empty.reduceLeft' is a Scala error which occurs while reducing an empty collection like list, as a collection must have at least one element to perform 'reduce' operation. The error suggests that the data set on left side is empty. Please review your data and its inconsistencies for null/empty fields.

Additionally, please review the below references and inspect your code accordingly:

[1] https://www.garysieling.com/blog/fixing-scala-error-reduce-java-lang-unsupportedoperationexception-empty-reduceleft/ [2] https://stackoverflow.com/questions/6986241/is-it-valid-to-reduce-on-an-empty-set-of-sets [3] https://nrinaudo.github.io/scala-best-practices/partial_functions/traversable_reduce.html

If you would like further support in investigating this issue, please raise a case with AWS Premium Support team and provide your Glue Job Run ID.

Thanks

AWS
支持工程师
已回答 1 个月前
0

In this context, that error very likely means that is trying to get a username and password properties from the connection but one is missing (doublecheck what is the right spelling).

profile pictureAWS
专家
已回答 1 个月前
AWS
支持工程师
已审核 1 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则

相关内容