How add partitions on Glue Job without update table schema?

0

I created a Glue Table and added description and comments in the columns. I know the schema and it will not change. I have a Glue Job ETL that adds partitions to this table. I'm trying to do this in two ways, according to the documentation (https://docs.aws.amazon.com/glue/latest/dg/update-from-job.html):

**write_dynamic_frame_from_catalog **

additionalOptions = {"enableUpdateCatalog": True}
additionalOptions["partitionKeys"] = ["region", "year", "month", "day"]
sink = glueContext.write_dynamic_frame_from_catalog(frame=last_transform, database=<target_db_name>, table_name=<target_table_name>, transformation_ctx="write_sink", additional_options=additionalOptions)

**getSink **

sink = glueContext.getSink(
    connection_type="s3", 
    path="<S3_output_path>",
    enableUpdateCatalog=True,
    partitionKeys=["region", "year", "month", "day"])
sink.setFormat("json")
sink.setCatalogInfo(catalogDatabase=<target_db_name>, catalogTableName=<target_table_name>)
sink.writeFrame(last_transform)

The problem is: the table is updated with the new partitions, but the column comments have been deleted, including the table description.

How to keep the original description and column comments?

posta 2 anni fa2752 visualizzazioni
1 Risposta
0

Hi ,

please note that , as by the documentation page you linked, what you are experiencing is the default behaviuor:

You can also set the updateBehavior value to LOG if you want to prevent your table schema from being overwritten, but still want to add the new partitions. The default value of updateBehavior is UPDATE_IN_DATABASE, so if you don’t explicitly define it, then the table schema will be overwritten.

The code should look like:

additionalOptions = {
    "enableUpdateCatalog": True, 
    "updateBehavior": "LOG"}
additionalOptions["partitionKeys"] = ["partition_key0", "partition_key1"]

sink = glueContext.write_dynamic_frame_from_catalog(frame=last_transform, database=<dst_db_name>,
    table_name=<dst_tbl_name>, transformation_ctx="write_sink",
    additional_options=additionalOptions)
job.commit()
AWS
ESPERTO
con risposta 2 anni fa
  • Hi,

    Thanks for your answer, but I tried with "updateBehavior": "LOG", and the table description and column comments were updated.

  • thank you for the feed back I will research it a bit more

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande