Error while connecting AWS Glue to Redshift Serverless Workspace

0

I am using AWS Glue to migrate data from Glue Catalog to Redshift Serverless and S3 after some transformations.

Since, I am learning Glue, I am using the Redshift Serverless to save my money.

So, I used two methods :-

Method 1:- Visual ETL Script

This method worked well. I was able to store the **transformed **file to S3, and to Redshift Serverless Database. The table is filled with data and all.

Method 2:- Using Spark script

When I use the above script (in method 1) to create a new ETL Job, the Job fails. I get an error

An error occurred while calling o86.getSink. The connection attempt failed. .

The error log

Driver logs
Driver and executor log streams

23/09/18 15:18:43 INFO LogPusher: stopping
23/09/18 15:18:43 INFO ProcessLauncher: postprocessing
23/09/18 15:18:43 ERROR ProcessLauncher: Error from Python:Traceback (most recent call last):
File "/tmp/etl-glue-redshift-s3.py", line 51, in <module>
target_Redshift = glueContext.write_dynamic_frame.from_options(
File "/opt/amazon/lib/python3.7/site-packages/awsglue/dynamicframe.py", line 640, in from_options
return self._glue_context.write_dynamic_frame_from_options(frame,
File "/opt/amazon/lib/python3.7/site-packages/awsglue/context.py", line 337, in write_dynamic_frame_from_options
return self.write_from_options(frame, connection_type,
File "/opt/amazon/lib/python3.7/site-packages/awsglue/context.py", line 355, in write_from_options
sink = self.getSink(connection_type, format, transformation_ctx, **new_options)
File "/opt/amazon/lib/python3.7/site-packages/awsglue/context.py", line 317, in getSink
j_sink = self._ssql_ctx.getSink(connection_type,
File "/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 13



Which means the Connection to the Redshift Database is failing. However, the transformed CSV file is being stored to S3 correctly.

I am using the script as follows :-

ETL Script :-

target_Redshift = glueContext.write_dynamic_frame.from_options(
    frame=changeSchema_Transformation,
    connection_type="redshift",
    connection_options={
        "redshiftTmpDir": f"s3://{redshiftTmpDir_s3}/temporary/",
        "useConnectionProperties": "true",
        "dbtable": targer_redshift_dbtable,
        "connectionName": redshift_connection_name,
        "aws-iam-user" : target_iam_role_redshift
        #"preactions": "CREATE TABLE IF NOT EXISTS .......",
    },
)

I am assuming that the Redshift Serverless is not properly configured to work with ETL Jobs, but I have seen a lot of YouTube videos where the ETL job is succesful for Redshift Cluster.

Please help me with this issue !!

Glue
質問済み 8ヶ月前381ビュー
1回答
0

When you author jobs using AWS Glue Visual ETL, many Job properties are automatically pre-populated for you. When you choose to create your own script, you need to update these job properties yourselves.

In your case, please check if you have added the redshift connection to Job details tab > Advanced Properties > Connections section

Please also check the other Job properties and ensure that they match with the job that is created through Visual ETL

AWS
回答済み 8ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ