Cannot save Pushdown Predicate for S3 source's Glue Table in Glue Job visual mode

0

We are working on Glue ETL job with visual mode and have multiple S3 (Glue Data Catalog type) source nodes that we want to use Pushdown Predicate to implement daily updating task.

  • The Glue table format is COW Apache Hudi, created natively by Glue ETL job
  • The Glue table successfully created with Partition registered in Hudi, Hive and Glue level
  • The table's partition is mixed of string and integer type

We have created the predicate string and put it to S3 source node, then SAVE, Refresh the webpage and found that the predicate option did not intact. Inspecting the packet from AWS console to Glue Rest API, found that the console ignored the configuration and did not POST/save it to ETL Python script.

With this limitation, or bug, we have to clone the job to script-only mode and add the option manually. How can we save the configuration with the original visual mode?

  • cannot reproduce that, maybe there is something in the predicate that breaks the saving, can you share it?

質問済み 8ヶ月前175ビュー
1回答
0
承認された回答

As of today, the issue has been fixed and we can modified pushdown_predicate from the Glue interface. Checking Spark UI log found that the filter is intacted.

回答済み 7ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ