spark.sql not working on EMR (Serverless)

The following script does not create the table in the S3 location indicated by the query. I tested it locally and the Delta Json file is created and contains the information about the created table.

from pyspark.sql import SparkSession

spark = (SparkSession
    .builder
    .enableHiveSupport()
    .appName('omop_ddl')
    .getOrCreate()
    )


spark.sql(f"""
CREATE
OR REPLACE TABLE CONCEPT (
  CONCEPT_ID LONG,
  CONCEPT_NAME STRING,
  DOMAIN_ID STRING,
  VOCABULARY_ID STRING,
  CONCEPT_CLASS_ID STRING,
  STANDARD_CONCEPT STRING,
  CONCEPT_CODE STRING,
  VALID_START_DATE DATE,
  VALID_END_DATE DATE,
  INVALID_REASON STRING
) USING DELTA
LOCATION 's3a://ls-dl-mvp-s3deltalake/health_lakehouse/silver/concept';
""")

The configuration parameters are the following ones:

--conf spark.jars=s3a://ls-dl-mvp-s3development/spark_jars/delta-core_2.12-2.1.0.jar,s3a://ls-dl-mvp-s3development/spark_jars/delta-storage-2.1.0.jar 
--conf spark.executor.cores=1 
--conf spark.executor.memory=4g 
--conf spark.driver.cores=1 
--conf spark.driver.memory=4g 
--conf spark.executor.instances=1

I tried to modify the location in the query by inserting a non-existent bucket and the script did not go into error. Am I forgetting something? Thank you very much for your help

주제

분석

태그

아마존 EMR 아마존 EMR 서버리스

언어

English

anselboero

질문됨 2년 전150회 조회

답변 없음

최신
최다 투표
가장 많은 댓글

spark.sql not working on EMR (Serverless)

관련 콘텐츠