Aws glue script toDF().sort() method gives exception

0

Hi All,

I am facing this issue while using pyspark script code in aws glue job.

Code is as following

DyanmicFrame.toDF().orderBy(["col1", "col2"])

This code gives me error AnalysisException: cannot resolve 'col1' given input columns: []; But Dynamic frame had 200 columns in it. but on conversion to Dataframe, it gives me this error. In jupyter notebook same code is working fine.

Please guide me how to solve this problem.

asked 2 years ago1296 views
1 Answer
0

Hello,

I would like to inform above exception generally occurs when spark is not able to find conditional columns in dataset.

To confirm , I have tested sort and orderBy function in Glue job and it is working absolutely fine. Please find the sample code below:

++++++++++ datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "testdb", table_name = "nycflights13_csv", transformation_ctx = "datasource0")

datasource0.toDF().sort('year','month').show(5)

datasource0.toDF(). orderBy('year','month').show(5) ++++++++++

I would request you please verify schema once again and try to print sample data after creating the dynamic frame and then use sort or orderBy function:

+++++++++ DyanmicFrame.printSchema()

##Above function should print the columns which you would like to use in sort or orderBY

DyanmicFrame.toDF().show()

##Above function should return values

DyanmicFrame.toDF().sort('year','month').show(5)

DyanmicFrame.toDF(). orderBy('year','month').show(5) +++++++++++

If you still face any issue, Please feel free to reach out to AWS Premium Support with sample data and we will be happy to help.

Have a Nice day!

AWS
answered 2 years ago
  • Hi @Shubham_P, is there a way to sort() or orderBy() a Dynamic Dataframe avoiding going .toDF() ?

    Thanks

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions