Using Glue Interactive Session with Delta-Spark Package

0

Hi,

Has anyone manage to make the Delta-Spark python package work with Glue interactive session? It works when I submit Glue 4.0 job but not on interactive session. Below are a few code snippets that I used to configure the interactive session.

I always get thrown an error "ModuleNotFoundError: No module named 'pypark' " or "Py4JError: An error occurred while calling z:io.delta.tables.DeltaTable.forPath".

import delta
import s3fs
deltaTable = delta.tables.DeltaTable.forPath(spark, 's3a://irelandyq/delta/')

Below is how I configure the Glue or Spark session

 %%configure
{
"datalake-formats":"delta",
"conf": "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf 
spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
}


%glue_version 4.0
%worker_type G.1X
%number_of_workers 5
%additional_python_modules delta-spark==2.4.0

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)

**Attempt 2, using the spark builder session.

# Build list of packages entries using Maven coordinates (groupId:artifactId:version)
pkg_list = []
pkg_list.append("io.delta:delta-core_2.12:1.1.0")
pkg_list.append("org.apache.hadoop:hadoop-aws:3.2.2")

packages=(",".join(pkg_list))
print('packages: '+packages)

from pyspark.sql import SparkSession

# Instantiate Spark via builder
# Note: we use the `ContainerCredentialsProvider` to give us access to underlying IAM role permissions

spark = (SparkSession
    .builder
    .appName("PySparkApp") 
    .config("spark.jars.packages", packages) 
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") 
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") 
    .config("fs.s3a.aws.credentials.provider", 
"com.amazonaws.auth.ContainerCredentialsProvider") 
    .getOrCreate()) 

sc = spark.sparkContext
yiquan
asked 9 months ago45 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions