- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
Hi Kevin,
If you are using the visual editor, there's no out of the box transformation to order your data, so you must create yours. The simplest way is by creating a custom SQL Query transform.
Click on (+) to add a node and select "SQL Query", than just write the really simple query "SELECT * FROM .... ORDER BY <column>"
This generates the following script, which runs a Spark SQL query:
def sparkSqlQuery(glueContext, query, mapping, transformation_ctx) -> DynamicFrame: for alias, frame in mapping.items(): frame.toDF().createOrReplaceTempView(alias) result = spark.sql(query) return DynamicFrame.fromDF(result, glueContext, transformation_ctx) # Script generated for node SQL Query SqlQuery0 = """ select * from myDataSource order by <myDataSourceColumn> """ SQLQuery_node1692843137953 = sparkSqlQuery( glueContext, query=SqlQuery0, mapping={"myDataSource": ChangeSchema_node2}, transformation_ctx="SQLQuery_node1692843137953", )
Additionally if you are writing your own script you could convert your Dynamic Frame to a Spark Dataframe and then sort data using the spark api [1]:
sorted_df = myframe.toDF().orderBy(["mycolumn"])
Hope this helps you, if you have further questions please let me know.
Reference: [1] https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.orderBy.html
i = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options={"paths":[input_loc], "recurse": True, "compressionType": "gzip", "groupFiles": "inPartition", "groupSize": "104857600"}, format="json", )
I plan on loading in the json via the following group settings ^
If I sort by a column on the dynamic data frame:
sorted_df = i.toDF().orderBy(["col"])
Then output it into parquet, will each parquet file be sorted by the column within the file? I would instead like the column to be sorted "across" the parquet files, if that makes sense.
Something like "z-ordering" ?
Contenuto pertinente
- AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata 2 anni fa
Are you using the visual editor ? Or you have a script ?