How add partitions on Glue Job without update table schema?

0

I created a Glue Table and added description and comments in the columns. I know the schema and it will not change. I have a Glue Job ETL that adds partitions to this table. I'm trying to do this in two ways, according to the documentation (https://docs.aws.amazon.com/glue/latest/dg/update-from-job.html):

**write_dynamic_frame_from_catalog **

additionalOptions = {"enableUpdateCatalog": True}
additionalOptions["partitionKeys"] = ["region", "year", "month", "day"]
sink = glueContext.write_dynamic_frame_from_catalog(frame=last_transform, database=<target_db_name>, table_name=<target_table_name>, transformation_ctx="write_sink", additional_options=additionalOptions)

**getSink **

sink = glueContext.getSink(
    connection_type="s3", 
    path="<S3_output_path>",
    enableUpdateCatalog=True,
    partitionKeys=["region", "year", "month", "day"])
sink.setFormat("json")
sink.setCatalogInfo(catalogDatabase=<target_db_name>, catalogTableName=<target_table_name>)
sink.writeFrame(last_transform)

The problem is: the table is updated with the new partitions, but the column comments have been deleted, including the table description.

How to keep the original description and column comments?

已提问 2 年前2753 查看次数
1 回答
0

Hi ,

please note that , as by the documentation page you linked, what you are experiencing is the default behaviuor:

You can also set the updateBehavior value to LOG if you want to prevent your table schema from being overwritten, but still want to add the new partitions. The default value of updateBehavior is UPDATE_IN_DATABASE, so if you don’t explicitly define it, then the table schema will be overwritten.

The code should look like:

additionalOptions = {
    "enableUpdateCatalog": True, 
    "updateBehavior": "LOG"}
additionalOptions["partitionKeys"] = ["partition_key0", "partition_key1"]

sink = glueContext.write_dynamic_frame_from_catalog(frame=last_transform, database=<dst_db_name>,
    table_name=<dst_tbl_name>, transformation_ctx="write_sink",
    additional_options=additionalOptions)
job.commit()
AWS
专家
已回答 2 年前
  • Hi,

    Thanks for your answer, but I tried with "updateBehavior": "LOG", and the table description and column comments were updated.

  • thank you for the feed back I will research it a bit more

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则