Schema was not updated after the glue job run

0

Schema was not updated after the glue job run by following the code in this page: https://docs.aws.amazon.com/glue/latest/dg/update-from-job.html

In the "Updating table schema" sesssion, it says

When the job finishes, view the modified schema on the console right away, without having to rerun the crawler. You can enable this feature by adding a few lines of code to your ETL script, as shown in the following examples. The code uses enableUpdateCatalog set to true, and also updateBehavior set to UPDATE_IN_DATABASE, which indicates to overwrite the schema and add new partitions in the Data Catalog during the job run.

In my case, the schema of the source is changing over time. For example, there might be a new column added to the source table. However, in the data catalog, the schema of the destination table does not get updated, meaning that the new column is not added, after the glue job finishes.

What is going wrong? As a result, it throws "HIVE_CURSOR_ERROR: Failed to read Parquet file" when querying the table in the Athena.

質問済み 9ヶ月前1076ビュー
1回答
0

Hi,

Is your table "partitioned" ? If not, then you need to drop the table and re-create. This is a known limitation in AWS glue as per the documentations below :

Schema updates are not supported for non-partitioned tables (not using the "partitionKeys" option). https://docs.aws.amazon.com/glue/latest/dg/update-from-job.html

profile pictureAWS
エキスパート
回答済み 9ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ