Redshift - Glue Dynamic Data Frame Error

0

Hi,

First time glue user and would like some help.

Having trouble building a custom data frame connecting to redshift cluster and moving data from one table to another within the same cluster, i had no issues connecting to aurora or rds and doing the same activity. The weird thing about this is that the crawler connects to redshift cluster successfully and populates the data catalog with the desired tables.

Glue Job Run Feedback (Job Id:: jr_4968961060836ce01c57a6d9a5f0f1873b44292dccbd06706da031164383f481)

Error: Py4JJavaError: An error occurred while calling o59.getDynamicFrame.

Script:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ['TempDir','JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "rsft_beta_glue_views", table_name = "mgdwstage_glue_views_daily_unique_opens_by_cust_table_stg", redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasource0")
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("activity_date", "date", "activity_date", "date"), ("customer_id", "int", "customer_id", "int"), ("num_unique_opens", "long", "num_unique_opens", "long")], transformation_ctx = "applymapping1")
selectfields2 = SelectFields.apply(frame = applymapping1, paths = ["activity_date", "customer_id", "num_unique_opens"], transformation_ctx = "selectfields2")
datasink5 = glueContext.write_dynamic_frame.from_catalog(frame = selectfields2, database = "rsft_beta_glue_views", table_name = "mgdwstage_glue_views_daily_unique_opens_by_cust_table", redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink5")
job.commit()

Thanks
KI

Edited by: DatumVeritas on Aug 29, 2017 12:51 PM

asked 7 years ago2346 views
2 Answers
0

Update: AWS Glue Tech Support solved my case.

AWS Community: It seems like a limitation with Glue ETL job doesn't liking a special char Redshift Password but the Data Catalog has no issues with it.

answered 7 years ago
0

Hi,

I am getting similar error with Zeppelin notebook on complaining about "Py4JJavaError: An error occurred while calling getDynamicFrame".

I am having Glue connecting with RDS. Below is the exception stack:

Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-1897882756365177437.py", line 349, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-1897882756365177437.py", line 337, in <module>
exec(code)
File "<stdin>", line 1, in <module>
File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/dynamicframe.py", line 508, in from_catalog
return self.glue_context.create_dynamic_frame_from_catalog(db, table_name, redshift_tmp_dir, transformation_ctx, **kwargs)
File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/context.py", line 120, in create_dynamic_frame_from_catalog
return source.getFrame(**kwargs)
File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/data_source.py", line 36, in getFrame
jframe = self.jsource.getDynamicFrame()
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call
answer, self.gateway_client, self.target_id, self.name)
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o1201.getDynamicFrame.
: java.lang.NullPointerException
_ at java.util.Hashtable.put(Hashtable.java:459)

_ at java.util.Hashtable.putAll(Hashtable.java:523)

_ at com.amazonaws.services.glue.util.JDBCUtils$class.mergeProperties(JDBCUtils.scala:40)_
_ at com.amazonaws.services.glue.util.OracleUtils$.mergeProperties(JDBCUtils.scala:69)_
_ at com.amazonaws.services.glue.util.JDBCWrapper.getRawConnection(JDBCUtils.scala:549)_
_ at com.amazonaws.services.glue.JDBCDataSource.getPrimaryKeys(DataSource.scala:392)_
_ at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:354)_
_ at com.amazonaws.services.glue.DataSource$class.getDynamicFrame(DataSource.scala:45)_
_ at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:317)_
_ at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown Source)_
_ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)_
_ at java.lang.reflect.Method.invoke(Method.java:498)_
_ at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)_
_ at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)_
_ at py4j.Gateway.invoke(Gateway.java:280)_
_ at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)_
_ at py4j.commands.CallCommand.execute(CallCommand.java:79)_
_ at py4j.GatewayConnection.run(GatewayConnection.java:214)_
_ at java.lang.Thread.run(Thread.java:748)_

MParikh
answered 6 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions