Using Glue Interactive Session with Delta-Spark Package

0

Hi,

Has anyone manage to make the Delta-Spark python package work with Glue interactive session? It works when I submit Glue 4.0 job but not on interactive session. Below are a few code snippets that I used to configure the interactive session.

I always get thrown an error "ModuleNotFoundError: No module named 'pypark' " or "Py4JError: An error occurred while calling z:io.delta.tables.DeltaTable.forPath".

import delta
import s3fs
deltaTable = delta.tables.DeltaTable.forPath(spark, 's3a://irelandyq/delta/')

Below is how I configure the Glue or Spark session

 %%configure
{
"datalake-formats":"delta",
"conf": "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf 
spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
}


%glue_version 4.0
%worker_type G.1X
%number_of_workers 5
%additional_python_modules delta-spark==2.4.0

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)

**Attempt 2, using the spark builder session.

# Build list of packages entries using Maven coordinates (groupId:artifactId:version)
pkg_list = []
pkg_list.append("io.delta:delta-core_2.12:1.1.0")
pkg_list.append("org.apache.hadoop:hadoop-aws:3.2.2")

packages=(",".join(pkg_list))
print('packages: '+packages)

from pyspark.sql import SparkSession

# Instantiate Spark via builder
# Note: we use the `ContainerCredentialsProvider` to give us access to underlying IAM role permissions

spark = (SparkSession
    .builder
    .appName("PySparkApp") 
    .config("spark.jars.packages", packages) 
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") 
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") 
    .config("fs.s3a.aws.credentials.provider", 
"com.amazonaws.auth.ContainerCredentialsProvider") 
    .getOrCreate()) 

sc = spark.sparkContext
yiquan
gefragt vor 9 Monaten49 Aufrufe
Keine Antworten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen