Creating a Delta table from Spark using the Glue Catalog

0

I am trying to create a Delta Table from spark sql using the Glue meta catalog. I can correctly query a Delta table using the Glue metastore:

%%sql
select * from `my_table` VERSION AS OF 1 limit 2 

The issues start when I try to create a new Delta Table in the same catalog.

%%sql
CREATE TABLE `my_table_v1` USING DELTA 
AS (select * from `my_table` VERSION AS OF 1 limit 2 )

##
# IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: s3://my_s3_bucketmy_table_v1

Considering that the Glue database has location s3://my_s3_bucket looks like there is some issue in calculating the full path where the table files must be written.

I then tried to specify the LOCATION of the table creating an EXTERNAL table:

%%sql
CREATE TABLE `my_table_v1` USING DELTA 
LOCATION 's3://my_s3_bucket/my_table_v1'
AS (select * from `my_table` VERSION AS OF 1 limit 2 )

##
# IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: s3://my_s3_bucketmy_table_v1-__PLACEHOLDER__

In this way the data files are correctly generated in s3://my_s3_bucket/my_table_v1 but it then fails registering the table in the Glue database.

Any Suggestion?

  • I am able to use Spark SQL (CREATE OR REPLACE TABLE USING DELTA AS SELECT * FROM TempVIEW)

    Setting database location to a appropriate s3 path will resolve issue. In glue catalog go to database and edit the database to have s3 path location. Use glue version 4.0 or latest.

    We don't need to put location in spark code.

2 réponses
1

That is not supported, even if you solve the path issues, that is not a valid Delta/symlink table, it will create a generic Hive table with an array field.
The OSS Delta community is working to rectify that, in the meanwhile you can register Delta tables on the catalog using a Glue crawler or the DeltaTable "generate" API , but not directly via SparkSQL

profile pictureAWS
EXPERT
répondu il y a 9 mois
profile picture
EXPERT
vérifié il y a 14 jours
0

Thanks for confirming it. If it is not supported the first example in this doc page https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-delta-lake.html should be corrected as it produces the exact same issue. It basically attempts to do the same thing using pyspark syntax.

Example: Write a Delta Lake table to Amazon S3 and register it to the AWS Glue Data Catalog The following AWS Glue ETL script demonstrates how to write a Delta Lake table to Amazon S3 and register the table to the AWS Glue Data Catalog.*

profile picture
répondu il y a 9 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions