Glue DynamicFrame show method yields nothing
I'm using a Notebook together with a Glue Dev Endpoint to load data from S3 into a Glue DynamicFrame. The printSchema
method works fine but the show
method yields nothing although the dataframe is not empty. Converting the DynamicFrame into a Spark DataFrame actually yields a result (df.toDF().show()
). Here the dummy code that I'm using
glueContext = GlueContext(spark.sparkContext) df = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options = { "paths": [f"s3://bucketname/filename"]}, format="json", format_options={"multiline": True} ) df.printSchema() df.show()
This problem was posted on Stackoverflow before: https://stackoverflow.com/questions/56013334/spark-dynamic-frame-show-method-yields-nothing
Why does the show
method yield nothing? Is this a bug or is there something that I'm missing to make this work?
Hello,
I would like to inform DynamicFrame is similar to a DataFrame, except that each record is self-describing, so no schema is required initially. Instead, AWS Glue computes a schema on-the-fly when required. Basically Glue DynamicFrame is based on RDD due to which show() method does not work directly and you need to convert dynamic frame to dataframe first to check the data in tabular format.
dyf.printSchema()
dyf.toDF().show()
Relevant questions
Sagemaker Notebook from Dev Endpoint
asked 4 months agoCan an Glue Crawler use a S3 Lambda Access Point as a data store?
asked 2 days agoDecimal Precision issues when writing from DynamicFrame
asked a month agoHow to connect a Sagemaker Notebook to Glue Catalog
Accepted Answerasked 2 years agoGlue DynamicFrame show method yields nothing
asked 21 days agosame glue job running differently when used sample method
asked 22 days agoWhat is the best practice to load data to redshift with aws glue ?
asked 2 years agoGlue + SageMaker Pip Packages
Accepted Answerasked 2 years agoCleaning a bucket with purge_s3_path with Glue console.
Accepted Answerasked 6 months agoAws glue script toDF().sort() method gives exception
asked 3 months ago
Converting the Glue DynamicFrame to a Spark DataFrame and using the show method is from my point of view a workaround. As you can see in the AWS Documentation, Glue DynamicFrames are supposed to have a show method as well: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-show
But this method does not work so this seems to be a bug. Will AWS provide a fix for that?