Glue ETL不更新数据目录

0

【以下的问题经过翻译处理】 你好,我正在创建一个Glue作业,将CSV文件转换为分区Parquet文件,并希望从ETL更新数据目录。 使用以下代码来完成此操作:

dynamic_frame: DynamicFrame = DynamicFrame.fromDF(final_data, glue_context, f"{file_type}_dataset")
sink = glue_context.getSink(
    connection_type="s3",
    enableUpdateCatalog=True,
    updateBehavior="UPDATE_IN_DATABASE",
    path=f"{target}",
    partitionKeys=partition_cols)

sink.setFormat("glueparquet")
sink.setCatalogInfo(catalogDatabase=conf.get_db_name(),
                    catalogTableName=conf.get_table_name_by_source(file_type))
sink.writeFrame(dynamic_frame)

正如您所看到的,将Spark DF转换为Glue DynamicFrame并将其写入到parquet中。

输出的parquet文件已经写入,但是我遇到了此错误,数据目录中没有表:

Exception: Problem processing file type cdr_cs because An error occurred while calling o477.pyWriteDynamicFrame.
: scala.MatchError: (null,false) (of class scala.Tuple2)
	at com.amazonaws.services.glue.DataSink.forwardPotentialDynamicFrameToCatalog(DataSink.scala:177)
	at com.amazonaws.services.glue.DataSink.forwardPotentialDynamicFrameToCatalog(DataSink.scala:135)
	at com.amazonaws.services.glue.sinks.HadoopDataSink.$anonfun$writeDynamicFrame$2(HadoopDataSink.scala:302)
	at com.amazonaws.services.glue.util.FileSchemeWrapper.$anonfun$executeWithQualifiedScheme$1(FileSchemeWrapper.scala:77)
	at com.amazonaws.services.glue.util.FileSchemeWrapper.executeWith(FileSchemeWrapper.scala:70)
	at com.amazonaws.services.glue.util.FileSchemeWrapper.executeWithQualifiedScheme(FileSchemeWrapper.scala:77)
	at com.amazonaws.services.glue.sinks.HadoopDataSink.$anonfun$writeDynamicFrame$1(HadoopDataSink.scala:157)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
	at com.amazonaws.services.glue.sinks.HadoopDataSink.writeDynamicFrame(HadoopDataSink.scala:151)
	at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:64)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl
profile picture
EXPERTE
gefragt vor 5 Monaten32 Aufrufe
1 Antwort
0

【以下的回答经过翻译处理】 这并没有提供所有的信息。由于在写入DynamicFrame时发生了错误,可能是文件布局和Glue Data Catalog中表定义之间存在差异。

在某些情况下,这也可能是访问问题。检查IAM访问和Lake formation是否开启,也可以检查Lake formation。

profile picture
EXPERTE
beantwortet vor 5 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen