AWS glue studio node long run time for data preview

0

Hi, I am using AWS glue studio to read from a DDB table with direct DDB connection. So far my visual diagram has two nodes:

  1. Source DDB table node -> Here preview takes 5 minutes for only 2 rows of dataset but at least shows result
  2. Transform- selectFields -> Here session runs for long time (>20 minutes) and fails with error of 'session not ready My DDB table is of 691 bytes with provisioned capacity units as 5 RCU and 5 WCU. The glue job details has below config:
  3. Glue version -> 4.0
  4. Language-> python3
  5. Worker Type -> G1X (automatic scale for number of workers is enabled)
  6. Max number of workers -> 11
  7. job timeout-> 2880

Considering this is a smaller data subset, can you please let me know why it is taking a long time to run? or where to look for any related insights? I am hoping to use this as a part of my production data-pipeline that will transform and move data to redshift for DW purposes. Unfortunately there isn't enough information available for glue studios.

已提问 2 个月前222 查看次数
1 回答
0

First of all I would suggest using on-demand mode in DynamoDB, at least until you get it working correctly. When you have 5 RCU, Glue takes that number as a limit, and rate limits its requests as not to exceed it. But I suspect you may have other issues.

Moreover, DynamoDB is releasing ZeroETL with Redshift, which is now in private preview, so perhaps it's advisable not to spend too much time creating the wheel. https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-dynamodb-zero-etl-integration-redshift/

profile pictureAWS
专家
已回答 2 个月前
  • Hi Leeroy, thanks for the prompt response and redirecting towards zero ETL with Redshift blog. While our account gets allow-listed for the preview, can you please let me know what other parts of the config I should be looking at to speed up the preview of sample dataset? I have changed DDB tables to on-demand mode, but it's not really speed up yet.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则