Aws glue script toDF().sort() method gives exception

0

Hi All,

I am facing this issue while using pyspark script code in aws glue job.

Code is as following

DyanmicFrame.toDF().orderBy(["col1", "col2"])

This code gives me error AnalysisException: cannot resolve 'col1' given input columns: []; But Dynamic frame had 200 columns in it. but on conversion to Dataframe, it gives me this error. In jupyter notebook same code is working fine.

Please guide me how to solve this problem.

已提問 2 年前檢視次數 1319 次
1 個回答
0

Hello,

I would like to inform above exception generally occurs when spark is not able to find conditional columns in dataset.

To confirm , I have tested sort and orderBy function in Glue job and it is working absolutely fine. Please find the sample code below:

++++++++++ datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "testdb", table_name = "nycflights13_csv", transformation_ctx = "datasource0")

datasource0.toDF().sort('year','month').show(5)

datasource0.toDF(). orderBy('year','month').show(5) ++++++++++

I would request you please verify schema once again and try to print sample data after creating the dynamic frame and then use sort or orderBy function:

+++++++++ DyanmicFrame.printSchema()

##Above function should print the columns which you would like to use in sort or orderBY

DyanmicFrame.toDF().show()

##Above function should return values

DyanmicFrame.toDF().sort('year','month').show(5)

DyanmicFrame.toDF(). orderBy('year','month').show(5) +++++++++++

If you still face any issue, Please feel free to reach out to AWS Premium Support with sample data and we will be happy to help.

Have a Nice day!

AWS
已回答 2 年前
  • Hi @Shubham_P, is there a way to sort() or orderBy() a Dynamic Dataframe avoiding going .toDF() ?

    Thanks

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南