How add partitions on Glue Job without update table schema?

0

I created a Glue Table and added description and comments in the columns. I know the schema and it will not change. I have a Glue Job ETL that adds partitions to this table. I'm trying to do this in two ways, according to the documentation (https://docs.aws.amazon.com/glue/latest/dg/update-from-job.html):

**write_dynamic_frame_from_catalog **

additionalOptions = {"enableUpdateCatalog": True}
additionalOptions["partitionKeys"] = ["region", "year", "month", "day"]
sink = glueContext.write_dynamic_frame_from_catalog(frame=last_transform, database=<target_db_name>, table_name=<target_table_name>, transformation_ctx="write_sink", additional_options=additionalOptions)

**getSink **

sink = glueContext.getSink(
    connection_type="s3", 
    path="<S3_output_path>",
    enableUpdateCatalog=True,
    partitionKeys=["region", "year", "month", "day"])
sink.setFormat("json")
sink.setCatalogInfo(catalogDatabase=<target_db_name>, catalogTableName=<target_table_name>)
sink.writeFrame(last_transform)

The problem is: the table is updated with the new partitions, but the column comments have been deleted, including the table description.

How to keep the original description and column comments?

質問済み 2年前2752ビュー
1回答
0

Hi ,

please note that , as by the documentation page you linked, what you are experiencing is the default behaviuor:

You can also set the updateBehavior value to LOG if you want to prevent your table schema from being overwritten, but still want to add the new partitions. The default value of updateBehavior is UPDATE_IN_DATABASE, so if you don’t explicitly define it, then the table schema will be overwritten.

The code should look like:

additionalOptions = {
    "enableUpdateCatalog": True, 
    "updateBehavior": "LOG"}
additionalOptions["partitionKeys"] = ["partition_key0", "partition_key1"]

sink = glueContext.write_dynamic_frame_from_catalog(frame=last_transform, database=<dst_db_name>,
    table_name=<dst_tbl_name>, transformation_ctx="write_sink",
    additional_options=additionalOptions)
job.commit()
AWS
エキスパート
回答済み 2年前
  • Hi,

    Thanks for your answer, but I tried with "updateBehavior": "LOG", and the table description and column comments were updated.

  • thank you for the feed back I will research it a bit more

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン