AWS Glue Job Error : An error occurred while calling o115.getDynamicFrame.: java.lang.UnsupportedOperationException: empty.reduceLeft

0

Hello Friends, I am working on a project that reads data from Dremio data lakehouse solution, I am trying to read the data from one of its schema. Glue does not natively come with the connector, so I had to build a custom jdbc.

See my code base

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkConf, SparkContext
from pyspark.sql import SparkSession
from awsglue.context import GlueContext
from awsglue.job import Job
conf = SparkConf()

args = getResolvedOptions(sys.argv, ['JOB_NAME'])

conf.set("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")\
    .set("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog")\
    .set("spark.sql.catalog.glue_catalog.warehouse", "s3://dev-smt-data-cache/stageZone_iceberg/iceber_repo/smt-data/")\
    .set("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog")\
    .set("spark.sql.catalog.glue_catalog.io-impl","org.apache.iceberg.aws.s3.S3FileIO")\
    .set("--datalake-formats","iceberg")

sc = SparkContext(conf=conf)
glueContext = GlueContext(sc)

# below spark session will have the above configuration
spark = glueContext.spark_session   
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

DynamicFrame = glueContext.create_dynamic_frame.from_options(
        connection_type = "jdbc", 
        connection_options = {
        "query":""" 'SELECT FileID FROM "mp2-appsrvspace".ESG.CTG."USG_2000_Transaction" LIMIT 10' """,
        "inferSchema":True,
        # "dbtable": """ "mp2-appsrvspace".ESG.CTG."USG_2000_Transaction" """,
        "connectionName":"Dremio-Stage"}
         #"transformation_ctx" = "DynamicFrame"
         )

applyformat = ApplyMapping.apply(
    frame =DynamicFrame, 
    mappings =
        [("field1","string","field1","string")
        #("field2","string","field2","string") 
        ], 
    transformation_ctx = "applyformat"
    )      
dynamicFrame = DynamicFrame.toDF().createOrReplaceTempView("temp_table")
print(dynamicFrame.head(5))

I keep getting this error before , I have different approaches none working out

   raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o115.getDynamicFrame.
: java.lang.UnsupportedOperationException: empty.reduceLeft

I will appreciate some guide/hints me on how can I fix the problem?

SAM
asked 14 days ago137 views
2 Answers
0

Hello,

Error 'empty.reduceLeft' is a Scala error which occurs while reducing an empty collection like list, as a collection must have at least one element to perform 'reduce' operation. The error suggests that the data set on left side is empty. Please review your data and its inconsistencies for null/empty fields.

Additionally, please review the below references and inspect your code accordingly:

[1] https://www.garysieling.com/blog/fixing-scala-error-reduce-java-lang-unsupportedoperationexception-empty-reduceleft/ [2] https://stackoverflow.com/questions/6986241/is-it-valid-to-reduce-on-an-empty-set-of-sets [3] https://nrinaudo.github.io/scala-best-practices/partial_functions/traversable_reduce.html

If you would like further support in investigating this issue, please raise a case with AWS Premium Support team and provide your Glue Job Run ID.

Thanks

AWS
SUPPORT ENGINEER
answered 14 days ago
0

In this context, that error very likely means that is trying to get a username and password properties from the connection but one is missing (doublecheck what is the right spelling).

profile pictureAWS
EXPERT
answered 14 days ago
AWS
SUPPORT ENGINEER
reviewed 12 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions