By using AWS re:Post, you agree to the Terms of Use
/pyspark.sql.utils.AnalysisException• Cannot resolve column name "previous_project_id among ()/

pyspark.sql.utils.AnalysisException• Cannot resolve column name "previous_project_id among ()


Hi, I'm relatively new to AWS glue and was having trouble in the following transformation codes:

DataSource4 = glueContext.create_dynamic_frame.from_catalog(database = "beta", table_name = "[table_name]", transformation_ctx = "DataSource4")
Transform9 = ApplyMapping.apply(frame = DataSource4, mappings = [("project_unique_id", "int", "(src) project_unique_id", "int")], transformation_ctx = "Transform9")

Transform9DF = Transform9.toDF()
Transform3DF = Transform3.toDF()
Transform12 = DynamicFrame.fromDF(Transform3DF.join(Transform9DF, (Transform3DF['project_unique_id'] == Transform9DF['previous_project_id']), "leftanti"), glueContext, "Transform12")

The job is failing with error : raise AnalysisException:(s.split(': ',1)[1], stackTrace) 'Cannot resolve column name "previous_project_id" among ((src) project_unique_id);' on checking the tables, both columns "project_unique_id" and "previous_project_id" are filled with NULL values, could that be the reason for the above error?

1 Answers

Hi ,

could you please clarify the statement:

on checking the tables, both columns "project_unique_id" and "previous_project_id" are filled with NULL values

which table? the source table? or which DataFrame? Transform9DF or Transform3DF ?

could you post the schema of these 2 dataframes?

I might be mistaken, but I think that the apply mapping you are using is dropping any field other than "(src) project_unique_id" so when you trying to join on Transform9DF['previous_project_id'] this field is not found.

thank you

answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions