How add partitions on Glue Job without update table schema?
I created a Glue Table and added description and comments in the columns. I know the schema and it will not change. I have a Glue Job ETL that adds partitions to this table. I'm trying to do this in two ways, according to the documentation (https://docs.aws.amazon.com/glue/latest/dg/update-from-job.html):
**write_dynamic_frame_from_catalog **
additionalOptions = {"enableUpdateCatalog": True}
additionalOptions["partitionKeys"] = ["region", "year", "month", "day"]
sink = glueContext.write_dynamic_frame_from_catalog(frame=last_transform, database=<target_db_name>, table_name=<target_table_name>, transformation_ctx="write_sink", additional_options=additionalOptions)
**getSink **
sink = glueContext.getSink(
connection_type="s3",
path="<S3_output_path>",
enableUpdateCatalog=True,
partitionKeys=["region", "year", "month", "day"])
sink.setFormat("json")
sink.setCatalogInfo(catalogDatabase=<target_db_name>, catalogTableName=<target_table_name>)
sink.writeFrame(last_transform)
The problem is: the table is updated with the new partitions, but the column comments have been deleted, including the table description.
How to keep the original description and column comments?
Hi ,
please note that , as by the documentation page you linked, what you are experiencing is the default behaviuor:
You can also set the updateBehavior
value to LOG
if you want to prevent your table schema from being overwritten, but still want to add the new partitions. The default value of updateBehavior
is UPDATE_IN_DATABASE
, so if you don’t explicitly define it, then the table schema will be overwritten.
The code should look like:
additionalOptions = {
"enableUpdateCatalog": True,
"updateBehavior": "LOG"}
additionalOptions["partitionKeys"] = ["partition_key0", "partition_key1"]
sink = glueContext.write_dynamic_frame_from_catalog(frame=last_transform, database=<dst_db_name>,
table_name=<dst_tbl_name>, transformation_ctx="write_sink",
additional_options=additionalOptions)
job.commit()
thank you for the feed back I will research it a bit more
Relevant questions
Update Records with AWS Glue
asked 3 months agoCan't get Partitions to work with my Glue Data Catalog
Accepted Answerasked 2 months agoData Catalog schema table getting modified when I run my Glue ETL job
asked a month agoGlue transform columns limit
Accepted AnswerHow could we have Glue to get data from csv as String?
Accepted Answerasked 2 months agoGlue to automatically create target schema
Accepted Answerasked 5 years agoAWS Glue visual job
asked 2 months agoPartition schema mismatch in Glue Table
asked a month agoHow add partitions on Glue Job without update table schema?
asked 3 months agoI need to read S3 data, transform and put into Data Catalog. Should I be using a Crawler?
Accepted Answerasked 4 months ago
Hi,
Thanks for your answer, but I tried with "updateBehavior": "LOG", and the table description and column comments were updated.