1回答
- 新しい順
- 投票が多い順
- コメントが多い順
0
If writing a plain file is fast, I suspect the performance issue is with "partitionKeys=partition_cols_list", maybe those columns have too much granularity and force writing lots of tiny files. Also the counting on the converted DF might result on double processing.
Since you already have a DataFrame, the DataFrame writer is faster doing table partitioning, you can achieve the same (as long as you are not writing to an S3 Lakeformation location) doing something like this (haven't tested it):
inc_df.writeTo(f'{target_db}.{dataset_dict_obj.get("table_name")}').partitionedBy(partition_cols_list).createOrReplace()
関連するコンテンツ
- AWS公式更新しました 3年前
- AWS公式更新しました 3年前