Error while connecting AWS Glue to Redshift Serverless Workspace

0

I am using AWS Glue to migrate data from Glue Catalog to Redshift Serverless and S3 after some transformations.

Since, I am learning Glue, I am using the Redshift Serverless to save my money.

So, I used two methods :-

Method 1:- Visual ETL Script

This method worked well. I was able to store the **transformed **file to S3, and to Redshift Serverless Database. The table is filled with data and all.

Method 2:- Using Spark script

When I use the above script (in method 1) to create a new ETL Job, the Job fails. I get an error

An error occurred while calling o86.getSink. The connection attempt failed. .

The error log

Driver logs
Driver and executor log streams

23/09/18 15:18:43 INFO LogPusher: stopping
23/09/18 15:18:43 INFO ProcessLauncher: postprocessing
23/09/18 15:18:43 ERROR ProcessLauncher: Error from Python:Traceback (most recent call last):
File "/tmp/etl-glue-redshift-s3.py", line 51, in <module>
target_Redshift = glueContext.write_dynamic_frame.from_options(
File "/opt/amazon/lib/python3.7/site-packages/awsglue/dynamicframe.py", line 640, in from_options
return self._glue_context.write_dynamic_frame_from_options(frame,
File "/opt/amazon/lib/python3.7/site-packages/awsglue/context.py", line 337, in write_dynamic_frame_from_options
return self.write_from_options(frame, connection_type,
File "/opt/amazon/lib/python3.7/site-packages/awsglue/context.py", line 355, in write_from_options
sink = self.getSink(connection_type, format, transformation_ctx, **new_options)
File "/opt/amazon/lib/python3.7/site-packages/awsglue/context.py", line 317, in getSink
j_sink = self._ssql_ctx.getSink(connection_type,
File "/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 13



Which means the Connection to the Redshift Database is failing. However, the transformed CSV file is being stored to S3 correctly.

I am using the script as follows :-

ETL Script :-

target_Redshift = glueContext.write_dynamic_frame.from_options(
    frame=changeSchema_Transformation,
    connection_type="redshift",
    connection_options={
        "redshiftTmpDir": f"s3://{redshiftTmpDir_s3}/temporary/",
        "useConnectionProperties": "true",
        "dbtable": targer_redshift_dbtable,
        "connectionName": redshift_connection_name,
        "aws-iam-user" : target_iam_role_redshift
        #"preactions": "CREATE TABLE IF NOT EXISTS .......",
    },
)

I am assuming that the Redshift Serverless is not properly configured to work with ETL Jobs, but I have seen a lot of YouTube videos where the ETL job is succesful for Redshift Cluster.

Please help me with this issue !!

Glue
asked 7 months ago351 views
1 Answer
0

When you author jobs using AWS Glue Visual ETL, many Job properties are automatically pre-populated for you. When you choose to create your own script, you need to update these job properties yourselves.

In your case, please check if you have added the redshift connection to Job details tab > Advanced Properties > Connections section

Please also check the other Job properties and ensure that they match with the job that is created through Visual ETL

AWS
answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions