Redshift - Glue Dynamic Data Frame Error

0

Hi,

First time glue user and would like some help.

Having trouble building a custom data frame connecting to redshift cluster and moving data from one table to another within the same cluster, i had no issues connecting to aurora or rds and doing the same activity. The weird thing about this is that the crawler connects to redshift cluster successfully and populates the data catalog with the desired tables.

Glue Job Run Feedback (Job Id:: jr_4968961060836ce01c57a6d9a5f0f1873b44292dccbd06706da031164383f481)

Error: Py4JJavaError: An error occurred while calling o59.getDynamicFrame.

Script:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ['TempDir','JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "rsft_beta_glue_views", table_name = "mgdwstage_glue_views_daily_unique_opens_by_cust_table_stg", redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasource0")
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("activity_date", "date", "activity_date", "date"), ("customer_id", "int", "customer_id", "int"), ("num_unique_opens", "long", "num_unique_opens", "long")], transformation_ctx = "applymapping1")
selectfields2 = SelectFields.apply(frame = applymapping1, paths = ["activity_date", "customer_id", "num_unique_opens"], transformation_ctx = "selectfields2")
datasink5 = glueContext.write_dynamic_frame.from_catalog(frame = selectfields2, database = "rsft_beta_glue_views", table_name = "mgdwstage_glue_views_daily_unique_opens_by_cust_table", redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink5")
job.commit()

Thanks
KI

Edited by: DatumVeritas on Aug 29, 2017 12:51 PM

已提問 7 年前檢視次數 2358 次
2 個答案
0

Update: AWS Glue Tech Support solved my case.

AWS Community: It seems like a limitation with Glue ETL job doesn't liking a special char Redshift Password but the Data Catalog has no issues with it.

已回答 7 年前
0

Hi,

I am getting similar error with Zeppelin notebook on complaining about "Py4JJavaError: An error occurred while calling getDynamicFrame".

I am having Glue connecting with RDS. Below is the exception stack:

Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-1897882756365177437.py", line 349, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-1897882756365177437.py", line 337, in <module>
exec(code)
File "<stdin>", line 1, in <module>
File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/dynamicframe.py", line 508, in from_catalog
return self.glue_context.create_dynamic_frame_from_catalog(db, table_name, redshift_tmp_dir, transformation_ctx, **kwargs)
File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/context.py", line 120, in create_dynamic_frame_from_catalog
return source.getFrame(**kwargs)
File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/data_source.py", line 36, in getFrame
jframe = self.jsource.getDynamicFrame()
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call
answer, self.gateway_client, self.target_id, self.name)
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o1201.getDynamicFrame.
: java.lang.NullPointerException
_ at java.util.Hashtable.put(Hashtable.java:459)

_ at java.util.Hashtable.putAll(Hashtable.java:523)

_ at com.amazonaws.services.glue.util.JDBCUtils$class.mergeProperties(JDBCUtils.scala:40)_
_ at com.amazonaws.services.glue.util.OracleUtils$.mergeProperties(JDBCUtils.scala:69)_
_ at com.amazonaws.services.glue.util.JDBCWrapper.getRawConnection(JDBCUtils.scala:549)_
_ at com.amazonaws.services.glue.JDBCDataSource.getPrimaryKeys(DataSource.scala:392)_
_ at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:354)_
_ at com.amazonaws.services.glue.DataSource$class.getDynamicFrame(DataSource.scala:45)_
_ at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:317)_
_ at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown Source)_
_ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)_
_ at java.lang.reflect.Method.invoke(Method.java:498)_
_ at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)_
_ at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)_
_ at py4j.Gateway.invoke(Gateway.java:280)_
_ at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)_
_ at py4j.commands.CallCommand.execute(CallCommand.java:79)_
_ at py4j.GatewayConnection.run(GatewayConnection.java:214)_
_ at java.lang.Thread.run(Thread.java:748)_

MParikh
已回答 6 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南