Glue ETL error: Cannot resolve column name "***" among ()

0

Hi, I'm relatively new to AWS glue and i'm having trouble with Glue ETL error. What makes it strange is that this error is only on Dev env but not on Test env. Same code & configuration! Also tried to .printSchema() but it doesn't show on the logs.

code: JoinCheckinPartner_DF = JoinCheckinPartner_node1.toDF() ApplyMappingCustomerAccessPass_DF = ApplyMappingCustomerAccessPass.toDF() ApplyMappingAccessToken_DF = ApplyMappingAccessToken.toDF()

JoinCheckinPartnerCustomerAccessPass_DF = JoinCheckinPartner_DF.join( ApplyMappingCustomerAccessPass_DF, JoinCheckinPartner_DF.customer_access_pass_id == ApplyMappingCustomerAccessPass_DF.id_from_customeraccesspass_table, how = 'left_outer', ) Error: 23/12/13 15:46:36 ERROR ProcessLauncher: Error from Python:Traceback (most recent call last): File "/tmp/02562bed0b28de087112b67ae97bf8681ae397a0d0083f36be9bc5f3c6b350a6.py", line 495, in <module> JoinCheckinPartner_DF["customer_access_pass_id"] == ApplyMappingCustomerAccessPass_DF["id_from_customeraccesspass_table"]...

pyspark.sql.utils.AnalysisException: Cannot resolve column name "customer_access_pass_id" among ()

Glue Job Type: Spark ETL Language: python 3 Glue Version: Glue 3.0

1 réponse
0
Réponse acceptée

I think the issue is the mixture of using DynamicFrame and DataFrame (I understand you do that to be able to use a left join which is not supported on DynamicFrame).
If you don't have actual data reaching that point, when you convert to DataFrame it will have an empty schema and then fail.
You could check for data present before reaching that point (e.g. count() > 0)

profile pictureAWS
EXPERT
répondu il y a 5 mois
  • Thanks so much, Gonzalo. This explains it.

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions