How add partitions on Glue Job without update table schema?

0

I created a Glue Table and added description and comments in the columns. I know the schema and it will not change. I have a Glue Job ETL that adds partitions to this table. I'm trying to do this in two ways, according to the documentation (https://docs.aws.amazon.com/glue/latest/dg/update-from-job.html):

**write_dynamic_frame_from_catalog **

additionalOptions = {"enableUpdateCatalog": True}
additionalOptions["partitionKeys"] = ["region", "year", "month", "day"]
sink = glueContext.write_dynamic_frame_from_catalog(frame=last_transform, database=<target_db_name>, table_name=<target_table_name>, transformation_ctx="write_sink", additional_options=additionalOptions)

**getSink **

sink = glueContext.getSink(
    connection_type="s3", 
    path="<S3_output_path>",
    enableUpdateCatalog=True,
    partitionKeys=["region", "year", "month", "day"])
sink.setFormat("json")
sink.setCatalogInfo(catalogDatabase=<target_db_name>, catalogTableName=<target_table_name>)
sink.writeFrame(last_transform)

The problem is: the table is updated with the new partitions, but the column comments have been deleted, including the table description.

How to keep the original description and column comments?

gefragt vor 2 Jahren2753 Aufrufe
1 Antwort
0

Hi ,

please note that , as by the documentation page you linked, what you are experiencing is the default behaviuor:

You can also set the updateBehavior value to LOG if you want to prevent your table schema from being overwritten, but still want to add the new partitions. The default value of updateBehavior is UPDATE_IN_DATABASE, so if you don’t explicitly define it, then the table schema will be overwritten.

The code should look like:

additionalOptions = {
    "enableUpdateCatalog": True, 
    "updateBehavior": "LOG"}
additionalOptions["partitionKeys"] = ["partition_key0", "partition_key1"]

sink = glueContext.write_dynamic_frame_from_catalog(frame=last_transform, database=<dst_db_name>,
    table_name=<dst_tbl_name>, transformation_ctx="write_sink",
    additional_options=additionalOptions)
job.commit()
AWS
EXPERTE
beantwortet vor 2 Jahren
  • Hi,

    Thanks for your answer, but I tried with "updateBehavior": "LOG", and the table description and column comments were updated.

  • thank you for the feed back I will research it a bit more

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen