Invoking EvaluateDataQuality().process_rows returns an NoSuchElement error

0

I'm trying to customize a script built via Visual ETL and came across a error Steps I followed

  • A DynamicFrame created from csv file present in s3 needs to be evaluated for its data-type
  • I'm enforcing a schema to a custom schema using apply_mapping function and reset its schema
  • Then I perform a conversion from NaN to None ( converting DynamicFrame to Dataframe and then back to DynamicFrame)
  • I passed this transformed **Parsed_dynamic_Frame **(DynamicFrame) to EvaluateDataQuality().process_rows for validation with ruleset Rules = [ ColumnDataType "Col_name" = "integer" ]
data_schema_dquee_results_set = EvaluateDataQuality().process_rows(
        frame=Parsed_dynamic_Frame,
        ruleset=ruleset,
        publishing_options={
            "dataQualityEvaluationContext": "data_quality_schema_rule_set",
            # "enableDataQualityCloudWatchMetrics": True,
            "enableDataQualityResultsPublishing": True
        },
        additional_options={"performanceTuning.caching": "CACHE_NOTHING"}
    )

I encounter the following error

: An error occurred while calling z:com.amazonaws.services.glue.dq.EvaluateDataQuality.processRows.
: java.util.NoSuchElementException: None.get
	at scala.None$.get(Option.scala:529)
	at scala.None$.get(Option.scala:527)
	at com.amazonaws.services.glue.dq.EvaluateDataQualityHelper.processInner(EvaluateDataQuality.scala:283)
	at com.amazonaws.services.glue.dq.EvaluateDataQualityHelper.process(EvaluateDataQuality.scala:157)
	at com.amazonaws.services.glue.dq.EvaluateDataQualityHelper.process$(EvaluateDataQuality.scala:152)
	at com.amazonaws.services.glue.dq.EvaluateDataQuality$.process(EvaluateDataQuality.scala:64)
	at com.amazonaws.services.glue.dq.EvaluateDataQuality$.processRows(EvaluateDataQuality.scala:124)
	at com.amazonaws.services.glue.dq.EvaluateDataQuality.processRows(EvaluateDataQuality.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)

Couldn't find much here. Can someone help me here?

Fyi, This is all done in AWS Glue Notebooks %glue_version 4.0 %idle_timeout 2880 %worker_type G.1X %number_of_workers 2

Prog
asked 4 months ago238 views
1 Answer
0
Accepted Answer

The "publish" option requires information about the job in order to register it so it's visible in the job quality tab.
Since you don't have a job but a notebook, it fails. You would need to disable enableDataQualityResultsPublishing as you did for enableDataQualityCloudWatchMetrics I'll open a ticket so it's handled better but maybe try setting the properties manually with some placeholders and see if it lets you run the code:

spark._jvm.java.lang.System.setProperty("spark.glue.JOB_NAME", "MyNotebook")
spark._jvm.java.lang.System.setProperty("spark.glue.JOB_RUN_ID", "jr_1234")

profile pictureAWS
EXPERT
answered 4 months ago
profile picture
EXPERT
reviewed 4 months ago
  • @Gonzalo Herreros, Setting enableDataQualityResultsPublishing to False prevented the error But, does that mean I'll not be able to access DataQualityRulesPass , DataQualityRulesFail , DataQualityRulesSkip , DataQualityEvaluationResult parameters?

  • It looks like I actually need the job results for further processing.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions