Glue ETL error: Cannot resolve column name "***" among ()

0

Hi, I'm relatively new to AWS glue and i'm having trouble with Glue ETL error. What makes it strange is that this error is only on Dev env but not on Test env. Same code & configuration! Also tried to .printSchema() but it doesn't show on the logs.

code: JoinCheckinPartner_DF = JoinCheckinPartner_node1.toDF() ApplyMappingCustomerAccessPass_DF = ApplyMappingCustomerAccessPass.toDF() ApplyMappingAccessToken_DF = ApplyMappingAccessToken.toDF()

JoinCheckinPartnerCustomerAccessPass_DF = JoinCheckinPartner_DF.join( ApplyMappingCustomerAccessPass_DF, JoinCheckinPartner_DF.customer_access_pass_id == ApplyMappingCustomerAccessPass_DF.id_from_customeraccesspass_table, how = 'left_outer', ) Error: 23/12/13 15:46:36 ERROR ProcessLauncher: Error from Python:Traceback (most recent call last): File "/tmp/02562bed0b28de087112b67ae97bf8681ae397a0d0083f36be9bc5f3c6b350a6.py", line 495, in <module> JoinCheckinPartner_DF["customer_access_pass_id"] == ApplyMappingCustomerAccessPass_DF["id_from_customeraccesspass_table"]...

pyspark.sql.utils.AnalysisException: Cannot resolve column name "customer_access_pass_id" among ()

Glue Job Type: Spark ETL Language: python 3 Glue Version: Glue 3.0

1개 답변
0
수락된 답변

I think the issue is the mixture of using DynamicFrame and DataFrame (I understand you do that to be able to use a left join which is not supported on DynamicFrame).
If you don't have actual data reaching that point, when you convert to DataFrame it will have an empty schema and then fail.
You could check for data present before reaching that point (e.g. count() > 0)

profile pictureAWS
전문가
답변함 5달 전
  • Thanks so much, Gonzalo. This explains it.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인