Redshift - Glue Dynamic Data Frame Error

0

Hi,

First time glue user and would like some help.

Having trouble building a custom data frame connecting to redshift cluster and moving data from one table to another within the same cluster, i had no issues connecting to aurora or rds and doing the same activity. The weird thing about this is that the crawler connects to redshift cluster successfully and populates the data catalog with the desired tables.

Glue Job Run Feedback (Job Id:: jr_4968961060836ce01c57a6d9a5f0f1873b44292dccbd06706da031164383f481)

Error: Py4JJavaError: An error occurred while calling o59.getDynamicFrame.

Script:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ['TempDir','JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "rsft_beta_glue_views", table_name = "mgdwstage_glue_views_daily_unique_opens_by_cust_table_stg", redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasource0")
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("activity_date", "date", "activity_date", "date"), ("customer_id", "int", "customer_id", "int"), ("num_unique_opens", "long", "num_unique_opens", "long")], transformation_ctx = "applymapping1")
selectfields2 = SelectFields.apply(frame = applymapping1, paths = ["activity_date", "customer_id", "num_unique_opens"], transformation_ctx = "selectfields2")
datasink5 = glueContext.write_dynamic_frame.from_catalog(frame = selectfields2, database = "rsft_beta_glue_views", table_name = "mgdwstage_glue_views_daily_unique_opens_by_cust_table", redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink5")
job.commit()

Thanks
KI

Edited by: DatumVeritas on Aug 29, 2017 12:51 PM

질문됨 7년 전2357회 조회
2개 답변
0

Update: AWS Glue Tech Support solved my case.

AWS Community: It seems like a limitation with Glue ETL job doesn't liking a special char Redshift Password but the Data Catalog has no issues with it.

답변함 7년 전
0

Hi,

I am getting similar error with Zeppelin notebook on complaining about "Py4JJavaError: An error occurred while calling getDynamicFrame".

I am having Glue connecting with RDS. Below is the exception stack:

Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-1897882756365177437.py", line 349, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-1897882756365177437.py", line 337, in <module>
exec(code)
File "<stdin>", line 1, in <module>
File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/dynamicframe.py", line 508, in from_catalog
return self.glue_context.create_dynamic_frame_from_catalog(db, table_name, redshift_tmp_dir, transformation_ctx, **kwargs)
File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/context.py", line 120, in create_dynamic_frame_from_catalog
return source.getFrame(**kwargs)
File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/data_source.py", line 36, in getFrame
jframe = self.jsource.getDynamicFrame()
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call
answer, self.gateway_client, self.target_id, self.name)
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o1201.getDynamicFrame.
: java.lang.NullPointerException
_ at java.util.Hashtable.put(Hashtable.java:459)

_ at java.util.Hashtable.putAll(Hashtable.java:523)

_ at com.amazonaws.services.glue.util.JDBCUtils$class.mergeProperties(JDBCUtils.scala:40)_
_ at com.amazonaws.services.glue.util.OracleUtils$.mergeProperties(JDBCUtils.scala:69)_
_ at com.amazonaws.services.glue.util.JDBCWrapper.getRawConnection(JDBCUtils.scala:549)_
_ at com.amazonaws.services.glue.JDBCDataSource.getPrimaryKeys(DataSource.scala:392)_
_ at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:354)_
_ at com.amazonaws.services.glue.DataSource$class.getDynamicFrame(DataSource.scala:45)_
_ at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:317)_
_ at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown Source)_
_ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)_
_ at java.lang.reflect.Method.invoke(Method.java:498)_
_ at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)_
_ at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)_
_ at py4j.Gateway.invoke(Gateway.java:280)_
_ at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)_
_ at py4j.commands.CallCommand.execute(CallCommand.java:79)_
_ at py4j.GatewayConnection.run(GatewayConnection.java:214)_
_ at java.lang.Thread.run(Thread.java:748)_

MParikh
답변함 6년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인