Redshift - Glue Dynamic Data Frame Error

0

Hi,

First time glue user and would like some help.

Having trouble building a custom data frame connecting to redshift cluster and moving data from one table to another within the same cluster, i had no issues connecting to aurora or rds and doing the same activity. The weird thing about this is that the crawler connects to redshift cluster successfully and populates the data catalog with the desired tables.

Glue Job Run Feedback (Job Id:: jr_4968961060836ce01c57a6d9a5f0f1873b44292dccbd06706da031164383f481)

Error: Py4JJavaError: An error occurred while calling o59.getDynamicFrame.

Script:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ['TempDir','JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "rsft_beta_glue_views", table_name = "mgdwstage_glue_views_daily_unique_opens_by_cust_table_stg", redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasource0")
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("activity_date", "date", "activity_date", "date"), ("customer_id", "int", "customer_id", "int"), ("num_unique_opens", "long", "num_unique_opens", "long")], transformation_ctx = "applymapping1")
selectfields2 = SelectFields.apply(frame = applymapping1, paths = ["activity_date", "customer_id", "num_unique_opens"], transformation_ctx = "selectfields2")
datasink5 = glueContext.write_dynamic_frame.from_catalog(frame = selectfields2, database = "rsft_beta_glue_views", table_name = "mgdwstage_glue_views_daily_unique_opens_by_cust_table", redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink5")
job.commit()

Thanks
KI

Edited by: DatumVeritas on Aug 29, 2017 12:51 PM

質問済み 7年前2357ビュー
2回答
0

Update: AWS Glue Tech Support solved my case.

AWS Community: It seems like a limitation with Glue ETL job doesn't liking a special char Redshift Password but the Data Catalog has no issues with it.

回答済み 7年前
0

Hi,

I am getting similar error with Zeppelin notebook on complaining about "Py4JJavaError: An error occurred while calling getDynamicFrame".

I am having Glue connecting with RDS. Below is the exception stack:

Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-1897882756365177437.py", line 349, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-1897882756365177437.py", line 337, in <module>
exec(code)
File "<stdin>", line 1, in <module>
File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/dynamicframe.py", line 508, in from_catalog
return self.glue_context.create_dynamic_frame_from_catalog(db, table_name, redshift_tmp_dir, transformation_ctx, **kwargs)
File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/context.py", line 120, in create_dynamic_frame_from_catalog
return source.getFrame(**kwargs)
File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/data_source.py", line 36, in getFrame
jframe = self.jsource.getDynamicFrame()
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call
answer, self.gateway_client, self.target_id, self.name)
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o1201.getDynamicFrame.
: java.lang.NullPointerException
_ at java.util.Hashtable.put(Hashtable.java:459)

_ at java.util.Hashtable.putAll(Hashtable.java:523)

_ at com.amazonaws.services.glue.util.JDBCUtils$class.mergeProperties(JDBCUtils.scala:40)_
_ at com.amazonaws.services.glue.util.OracleUtils$.mergeProperties(JDBCUtils.scala:69)_
_ at com.amazonaws.services.glue.util.JDBCWrapper.getRawConnection(JDBCUtils.scala:549)_
_ at com.amazonaws.services.glue.JDBCDataSource.getPrimaryKeys(DataSource.scala:392)_
_ at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:354)_
_ at com.amazonaws.services.glue.DataSource$class.getDynamicFrame(DataSource.scala:45)_
_ at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:317)_
_ at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown Source)_
_ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)_
_ at java.lang.reflect.Method.invoke(Method.java:498)_
_ at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)_
_ at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)_
_ at py4j.Gateway.invoke(Gateway.java:280)_
_ at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)_
_ at py4j.commands.CallCommand.execute(CallCommand.java:79)_
_ at py4j.GatewayConnection.run(GatewayConnection.java:214)_
_ at java.lang.Thread.run(Thread.java:748)_

MParikh
回答済み 6年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ