Glue ETL error: Cannot resolve column name "***" among ()

0

Hi, I'm relatively new to AWS glue and i'm having trouble with Glue ETL error. What makes it strange is that this error is only on Dev env but not on Test env. Same code & configuration! Also tried to .printSchema() but it doesn't show on the logs.

code: JoinCheckinPartner_DF = JoinCheckinPartner_node1.toDF() ApplyMappingCustomerAccessPass_DF = ApplyMappingCustomerAccessPass.toDF() ApplyMappingAccessToken_DF = ApplyMappingAccessToken.toDF()

JoinCheckinPartnerCustomerAccessPass_DF = JoinCheckinPartner_DF.join( ApplyMappingCustomerAccessPass_DF, JoinCheckinPartner_DF.customer_access_pass_id == ApplyMappingCustomerAccessPass_DF.id_from_customeraccesspass_table, how = 'left_outer', ) Error: 23/12/13 15:46:36 ERROR ProcessLauncher: Error from Python:Traceback (most recent call last): File "/tmp/02562bed0b28de087112b67ae97bf8681ae397a0d0083f36be9bc5f3c6b350a6.py", line 495, in <module> JoinCheckinPartner_DF["customer_access_pass_id"] == ApplyMappingCustomerAccessPass_DF["id_from_customeraccesspass_table"]...

pyspark.sql.utils.AnalysisException: Cannot resolve column name "customer_access_pass_id" among ()

Glue Job Type: Spark ETL Language: python 3 Glue Version: Glue 3.0

Philip
已提问 5 个月前251 查看次数
1 回答
0
已接受的回答

I think the issue is the mixture of using DynamicFrame and DataFrame (I understand you do that to be able to use a left join which is not supported on DynamicFrame).
If you don't have actual data reaching that point, when you convert to DataFrame it will have an empty schema and then fail.
You could check for data present before reaching that point (e.g. count() > 0)

profile pictureAWS
专家
已回答 5 个月前
  • Thanks so much, Gonzalo. This explains it.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则