Redshift - Glue Dynamic Data Frame Error

0

Hi,

First time glue user and would like some help.

Having trouble building a custom data frame connecting to redshift cluster and moving data from one table to another within the same cluster, i had no issues connecting to aurora or rds and doing the same activity. The weird thing about this is that the crawler connects to redshift cluster successfully and populates the data catalog with the desired tables.

Glue Job Run Feedback (Job Id:: jr_4968961060836ce01c57a6d9a5f0f1873b44292dccbd06706da031164383f481)

Error: Py4JJavaError: An error occurred while calling o59.getDynamicFrame.

Script:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ['TempDir','JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "rsft_beta_glue_views", table_name = "mgdwstage_glue_views_daily_unique_opens_by_cust_table_stg", redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasource0")
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("activity_date", "date", "activity_date", "date"), ("customer_id", "int", "customer_id", "int"), ("num_unique_opens", "long", "num_unique_opens", "long")], transformation_ctx = "applymapping1")
selectfields2 = SelectFields.apply(frame = applymapping1, paths = ["activity_date", "customer_id", "num_unique_opens"], transformation_ctx = "selectfields2")
datasink5 = glueContext.write_dynamic_frame.from_catalog(frame = selectfields2, database = "rsft_beta_glue_views", table_name = "mgdwstage_glue_views_daily_unique_opens_by_cust_table", redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink5")
job.commit()

Thanks
KI

Edited by: DatumVeritas on Aug 29, 2017 12:51 PM

gefragt vor 7 Jahren2358 Aufrufe
2 Antworten
0

Update: AWS Glue Tech Support solved my case.

AWS Community: It seems like a limitation with Glue ETL job doesn't liking a special char Redshift Password but the Data Catalog has no issues with it.

beantwortet vor 7 Jahren
0

Hi,

I am getting similar error with Zeppelin notebook on complaining about "Py4JJavaError: An error occurred while calling getDynamicFrame".

I am having Glue connecting with RDS. Below is the exception stack:

Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-1897882756365177437.py", line 349, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-1897882756365177437.py", line 337, in <module>
exec(code)
File "<stdin>", line 1, in <module>
File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/dynamicframe.py", line 508, in from_catalog
return self.glue_context.create_dynamic_frame_from_catalog(db, table_name, redshift_tmp_dir, transformation_ctx, **kwargs)
File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/context.py", line 120, in create_dynamic_frame_from_catalog
return source.getFrame(**kwargs)
File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/data_source.py", line 36, in getFrame
jframe = self.jsource.getDynamicFrame()
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call
answer, self.gateway_client, self.target_id, self.name)
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o1201.getDynamicFrame.
: java.lang.NullPointerException
_ at java.util.Hashtable.put(Hashtable.java:459)

_ at java.util.Hashtable.putAll(Hashtable.java:523)

_ at com.amazonaws.services.glue.util.JDBCUtils$class.mergeProperties(JDBCUtils.scala:40)_
_ at com.amazonaws.services.glue.util.OracleUtils$.mergeProperties(JDBCUtils.scala:69)_
_ at com.amazonaws.services.glue.util.JDBCWrapper.getRawConnection(JDBCUtils.scala:549)_
_ at com.amazonaws.services.glue.JDBCDataSource.getPrimaryKeys(DataSource.scala:392)_
_ at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:354)_
_ at com.amazonaws.services.glue.DataSource$class.getDynamicFrame(DataSource.scala:45)_
_ at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:317)_
_ at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown Source)_
_ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)_
_ at java.lang.reflect.Method.invoke(Method.java:498)_
_ at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)_
_ at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)_
_ at py4j.Gateway.invoke(Gateway.java:280)_
_ at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)_
_ at py4j.commands.CallCommand.execute(CallCommand.java:79)_
_ at py4j.GatewayConnection.run(GatewayConnection.java:214)_
_ at java.lang.Thread.run(Thread.java:748)_

MParikh
beantwortet vor 6 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen