1 réponse
- Le plus récent
- Le plus de votes
- La plupart des commentaires
0
By default when you do overwrite, since you are not specifying any partitions to update, it will override the whole table.
Changing spark.sql.sources.partitionOverwriteMode to dynamic (I think you can pass that as a write option), should detect which partitions are affected by your data and only override those
Contenus pertinents
- demandé il y a un an
- demandé il y a un an
- demandé il y a un mois
- demandé il y a un an
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a 3 ans
I tried both of the following methods and get the same issue in a Glue notebook while simultaneously reading from the table in Athena for testing purposes. The table definition should not be completely recreated and partitions re-added when "overwriting" a new partition. In my test, the data is only a new partition value.
The biggest issue with this behavior is that, when a job is long-running, fails when writing data, or is killed, the table may not be able to be read until it is manually recreated.
I also wonder if this type of behavior is because I am using Glue as the data catalog.
Update: this is not ideal, but I was able to get new partitions to append dynamically and not drop/recreate the table by using the following:
I don't think ".option("partitionOverwriteMode", "dynamic")" is doing anyway, without specifying the full property name