Read Data from Redshift Table and write to Redshift table using AWS Glue Pyspark

0

I am trying to read data from redshift schema_a and write its output into another redshfit table in schema_b. Below is the code I am using to read from redshift:

tmp_dir="s3://bjs-digital-dods-data-lake-processed/temp/rt_fact_payment/"
sample_querry = "select * from abc.table_a where condition_a>10"
RedshiftCluster_node3 = glueContext.create_dynamic_frame.from_options(
    connection_type="redshift",
    connection_options={
        "sampleQuery": sample_querry,
        "redshiftTmpDir": tmp_dir,
        "useConnectionProperties": "true",
        "connectionName": "dodsqa_redshift",
    },
    transformation_ctx="RedshiftCluster_node3",
)

When I run above query, I am getting below error:

caused by: com.amazon.redshift.util.RedshiftException: ERROR: S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid RERN2PVXA8V6RXJS,ExtRid fsd/1fDHyRPYoXYEc1I3o+LqUkOL6vaQm5gkqnVev6BFadXHqMP3aTLdtv7OB8nWLrOl8FouziY=,CanRetry 1 Detail: error: S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid RERN2PVXA8V6RXJS,ExtRid fsd/1fDHyRPYoXYEc1I3o+LqUkOL6vaQm5gkqnVev6BFadXHqMP3aTLdtv7OB8nWLrOl8FouziY=,CanRetry 1 code: 8001 context: Failed to initialize S3 output stream. S3 path: s3://bjs-digital-dods-data-lake-processed/temp/rt_fact_payment/62131958-fd04-4459-96a2-f15cdfe7c84a/0001_part_00.parquet query: 2496495 location: s3_parquet_unloader.cpp:419 process: query0_87_2496495 [pid=31904]

What am i missing here? Can I write more than 1 query in "sample_querry"? Any help will be much useful as its very urgent.

Joe
질문됨 3달 전358회 조회
3개 답변
1

The Glue role needs permission to read (GetObject) from the temporary dir, sounds you only have list permission

profile pictureAWS
전문가
답변함 3달 전
  • But the glue job had the S3 policy attracted to it and it had READ, WRITE, LIST mentioned in those policy.

1

Hi,

According to the error message, it looks like the Glue role doesn't have the right permissions to read from the temporary directory.

To fix this, you might have to update the permissions for the Glue role and give it access to GetObject in the temporary directory. You can do this by heading to the IAM console, looking for the Glue role, and giving it the required permissions.

Basically, you just need to add the "GetObject" action to the role. After you update the permissions, give your query another shot. If you're still having problems, just give me a heads up and I'll be glad to assist you further.

profile picture
답변함 3달 전
  • Is not glue or does redshift needs rule to access S3 bucket? Because as per my actual logic, I am trying to read data from redshift and write it back to another redshift table. So in the backend, it's write into S3 from redshift and read from S3 into redshift . But,YES thisbprocess happens inside glue. So not confused.

0
수락된 답변

When i changed the tmp_dir location, this piece if code worked fine. The newly added tmp_dir location (S3 bucket) was given by glue itself as default. I have no idea why is it like that. In Glue Visual studio, I added a Redshift Source object to pull data from my redshift db itself. Then, I was able to pull the underlying code for the above process. In that code, a "tmp_dir" was given and It worked fine. If I change that location to my own, then the code is failing with above mentioned error.

If any of you has reason for that, please share your comments.

Joe
답변함 3달 전
  • The only possible reasons I can think is that Redshift is in a different region (but you should get a clear error) or the role in Redshift doesn't have permission

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인