spark.sql not working on EMR (Serverless)

0

The following script does not create the table in the S3 location indicated by the query. I tested it locally and the Delta Json file is created and contains the information about the created table.

from pyspark.sql import SparkSession

spark = (SparkSession
    .builder
    .enableHiveSupport()
    .appName('omop_ddl')
    .getOrCreate()
    )


spark.sql(f"""
CREATE
OR REPLACE TABLE CONCEPT (
  CONCEPT_ID LONG,
  CONCEPT_NAME STRING,
  DOMAIN_ID STRING,
  VOCABULARY_ID STRING,
  CONCEPT_CLASS_ID STRING,
  STANDARD_CONCEPT STRING,
  CONCEPT_CODE STRING,
  VALID_START_DATE DATE,
  VALID_END_DATE DATE,
  INVALID_REASON STRING
) USING DELTA
LOCATION 's3a://ls-dl-mvp-s3deltalake/health_lakehouse/silver/concept';
""")

The configuration parameters are the following ones:

--conf spark.jars=s3a://ls-dl-mvp-s3development/spark_jars/delta-core_2.12-2.1.0.jar,s3a://ls-dl-mvp-s3development/spark_jars/delta-storage-2.1.0.jar 
--conf spark.executor.cores=1 
--conf spark.executor.memory=4g 
--conf spark.driver.cores=1 
--conf spark.driver.memory=4g 
--conf spark.executor.instances=1 

I tried to modify the location in the query by inserting a non-existent bucket and the script did not go into error. Am I forgetting something? Thank you very much for your help

gefragt vor 2 Jahren150 Aufrufe
Keine Antworten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen