Is there a hello world on running pydeequ in glue?

0

Hi there I'm trying to use pydeequ in glue (3.0 ETL) using this tutorial.

However I get this error:

2023-07-01 13:57:35,492 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
  File "/tmp/glue_job_integration.py", line 144, in <module>
    .onData(sdf)
  File "/home/spark/.local/lib/python3.7/site-packages/pydeequ/analyzers.py", line 52, in onData
    return AnalysisRunBuilder(self._spark_session, df)
  File "/home/spark/.local/lib/python3.7/site-packages/pydeequ/analyzers.py", line 124, in __init__
    self._AnalysisRunBuilder = self._jvm.com.amazon.deequ.analyzers.runners.AnalysisRunBuilder(df._jdf)
TypeError: 'JavaPackage' object is not callable
2023-07-01 13:57:35,492 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last): File "/tmp/glue_job_integration.py", line 144, in <module> .onData(sdf) File "/home/spark/.local/lib/python3.7/site-packages/pydeequ/analyzers.py", line 52, in onData return AnalysisRunBuilder(self._spark_session, df) File "/home/spark/.local/lib/python3.7/site-packages/pydeequ/analyzers.py", line 124, in __init__ self._AnalysisRunBuilder = self._jvm.com.amazon.deequ.analyzers.runners.AnalysisRunBuilder(df._jdf) TypeError: 'JavaPackage' object is not callable

I realize the example was made for Sagemaker. Anybody have a suggestion? This is my first time using deequ. Thank for reading!

asked 10 months ago338 views
1 Answer

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions