Glue ETL error: Cannot resolve column name "***" among ()

0

Hi, I'm relatively new to AWS glue and i'm having trouble with Glue ETL error. What makes it strange is that this error is only on Dev env but not on Test env. Same code & configuration! Also tried to .printSchema() but it doesn't show on the logs.

code: JoinCheckinPartner_DF = JoinCheckinPartner_node1.toDF() ApplyMappingCustomerAccessPass_DF = ApplyMappingCustomerAccessPass.toDF() ApplyMappingAccessToken_DF = ApplyMappingAccessToken.toDF()

JoinCheckinPartnerCustomerAccessPass_DF = JoinCheckinPartner_DF.join( ApplyMappingCustomerAccessPass_DF, JoinCheckinPartner_DF.customer_access_pass_id == ApplyMappingCustomerAccessPass_DF.id_from_customeraccesspass_table, how = 'left_outer', ) Error: 23/12/13 15:46:36 ERROR ProcessLauncher: Error from Python:Traceback (most recent call last): File "/tmp/02562bed0b28de087112b67ae97bf8681ae397a0d0083f36be9bc5f3c6b350a6.py", line 495, in <module> JoinCheckinPartner_DF["customer_access_pass_id"] == ApplyMappingCustomerAccessPass_DF["id_from_customeraccesspass_table"]...

pyspark.sql.utils.AnalysisException: Cannot resolve column name "customer_access_pass_id" among ()

Glue Job Type: Spark ETL Language: python 3 Glue Version: Glue 3.0

1回答
0
承認された回答

I think the issue is the mixture of using DynamicFrame and DataFrame (I understand you do that to be able to use a left join which is not supported on DynamicFrame).
If you don't have actual data reaching that point, when you convert to DataFrame it will have an empty schema and then fail.
You could check for data present before reaching that point (e.g. count() > 0)

profile pictureAWS
エキスパート
回答済み 5ヶ月前
  • Thanks so much, Gonzalo. This explains it.

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ