Error while connecting AWS Glue to Redshift Serverless Workspace

0

I am using AWS Glue to migrate data from Glue Catalog to Redshift Serverless and S3 after some transformations.

Since, I am learning Glue, I am using the Redshift Serverless to save my money.

So, I used two methods :-

Method 1:- Visual ETL Script

This method worked well. I was able to store the **transformed **file to S3, and to Redshift Serverless Database. The table is filled with data and all.

Method 2:- Using Spark script

When I use the above script (in method 1) to create a new ETL Job, the Job fails. I get an error

An error occurred while calling o86.getSink. The connection attempt failed. .

The error log

Driver logs
Driver and executor log streams

23/09/18 15:18:43 INFO LogPusher: stopping
23/09/18 15:18:43 INFO ProcessLauncher: postprocessing
23/09/18 15:18:43 ERROR ProcessLauncher: Error from Python:Traceback (most recent call last):
File "/tmp/etl-glue-redshift-s3.py", line 51, in <module>
target_Redshift = glueContext.write_dynamic_frame.from_options(
File "/opt/amazon/lib/python3.7/site-packages/awsglue/dynamicframe.py", line 640, in from_options
return self._glue_context.write_dynamic_frame_from_options(frame,
File "/opt/amazon/lib/python3.7/site-packages/awsglue/context.py", line 337, in write_dynamic_frame_from_options
return self.write_from_options(frame, connection_type,
File "/opt/amazon/lib/python3.7/site-packages/awsglue/context.py", line 355, in write_from_options
sink = self.getSink(connection_type, format, transformation_ctx, **new_options)
File "/opt/amazon/lib/python3.7/site-packages/awsglue/context.py", line 317, in getSink
j_sink = self._ssql_ctx.getSink(connection_type,
File "/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 13



Which means the Connection to the Redshift Database is failing. However, the transformed CSV file is being stored to S3 correctly.

I am using the script as follows :-

ETL Script :-

target_Redshift = glueContext.write_dynamic_frame.from_options(
    frame=changeSchema_Transformation,
    connection_type="redshift",
    connection_options={
        "redshiftTmpDir": f"s3://{redshiftTmpDir_s3}/temporary/",
        "useConnectionProperties": "true",
        "dbtable": targer_redshift_dbtable,
        "connectionName": redshift_connection_name,
        "aws-iam-user" : target_iam_role_redshift
        #"preactions": "CREATE TABLE IF NOT EXISTS .......",
    },
)

I am assuming that the Redshift Serverless is not properly configured to work with ETL Jobs, but I have seen a lot of YouTube videos where the ETL job is succesful for Redshift Cluster.

Please help me with this issue !!

Glue
질문됨 8달 전381회 조회
1개 답변
0

When you author jobs using AWS Glue Visual ETL, many Job properties are automatically pre-populated for you. When you choose to create your own script, you need to update these job properties yourselves.

In your case, please check if you have added the redshift connection to Job details tab > Advanced Properties > Connections section

Please also check the other Job properties and ensure that they match with the job that is created through Visual ETL

AWS
답변함 8달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인