I am using AWS Glue to migrate data from Glue Catalog to Redshift Serverless and S3 after some transformations.
Since, I am learning Glue, I am using the Redshift Serverless to save my money.
So, I used two methods :-
Method 1:- Visual ETL Script
This method worked well. I was able to store the **transformed **file to S3, and to Redshift Serverless Database. The table is filled with data and all.
Method 2:- Using Spark script
When I use the above script (in method 1) to create a new ETL Job, the Job fails. I get an error
An error occurred while calling o86.getSink. The connection attempt failed. .
The error log
Driver logs
Driver and executor log streams
23/09/18 15:18:43 INFO LogPusher: stopping
23/09/18 15:18:43 INFO ProcessLauncher: postprocessing
23/09/18 15:18:43 ERROR ProcessLauncher: Error from Python:Traceback (most recent call last):
File "/tmp/etl-glue-redshift-s3.py", line 51, in <module>
target_Redshift = glueContext.write_dynamic_frame.from_options(
File "/opt/amazon/lib/python3.7/site-packages/awsglue/dynamicframe.py", line 640, in from_options
return self._glue_context.write_dynamic_frame_from_options(frame,
File "/opt/amazon/lib/python3.7/site-packages/awsglue/context.py", line 337, in write_dynamic_frame_from_options
return self.write_from_options(frame, connection_type,
File "/opt/amazon/lib/python3.7/site-packages/awsglue/context.py", line 355, in write_from_options
sink = self.getSink(connection_type, format, transformation_ctx, **new_options)
File "/opt/amazon/lib/python3.7/site-packages/awsglue/context.py", line 317, in getSink
j_sink = self._ssql_ctx.getSink(connection_type,
File "/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 13
Which means the Connection to the Redshift Database is failing. However, the transformed CSV file is being stored to S3 correctly.
I am using the script as follows :-
ETL Script :-
target_Redshift = glueContext.write_dynamic_frame.from_options(
frame=changeSchema_Transformation,
connection_type="redshift",
connection_options={
"redshiftTmpDir": f"s3://{redshiftTmpDir_s3}/temporary/",
"useConnectionProperties": "true",
"dbtable": targer_redshift_dbtable,
"connectionName": redshift_connection_name,
"aws-iam-user" : target_iam_role_redshift
#"preactions": "CREATE TABLE IF NOT EXISTS .......",
},
)
I am assuming that the Redshift Serverless is not properly configured to work with ETL Jobs, but I have seen a lot of YouTube videos where the ETL job is succesful for Redshift Cluster.
Please help me with this issue !!