spark.sql not working on EMR (Serverless)

0

The following script does not create the table in the S3 location indicated by the query. I tested it locally and the Delta Json file is created and contains the information about the created table.

from pyspark.sql import SparkSession

spark = (SparkSession
    .builder
    .enableHiveSupport()
    .appName('omop_ddl')
    .getOrCreate()
    )


spark.sql(f"""
CREATE
OR REPLACE TABLE CONCEPT (
  CONCEPT_ID LONG,
  CONCEPT_NAME STRING,
  DOMAIN_ID STRING,
  VOCABULARY_ID STRING,
  CONCEPT_CLASS_ID STRING,
  STANDARD_CONCEPT STRING,
  CONCEPT_CODE STRING,
  VALID_START_DATE DATE,
  VALID_END_DATE DATE,
  INVALID_REASON STRING
) USING DELTA
LOCATION 's3a://ls-dl-mvp-s3deltalake/health_lakehouse/silver/concept';
""")

The configuration parameters are the following ones:

--conf spark.jars=s3a://ls-dl-mvp-s3development/spark_jars/delta-core_2.12-2.1.0.jar,s3a://ls-dl-mvp-s3development/spark_jars/delta-storage-2.1.0.jar 
--conf spark.executor.cores=1 
--conf spark.executor.memory=4g 
--conf spark.driver.cores=1 
--conf spark.driver.memory=4g 
--conf spark.executor.instances=1 

I tried to modify the location in the query by inserting a non-existent bucket and the script did not go into error. Am I forgetting something? Thank you very much for your help

질문됨 2년 전150회 조회
답변 없음

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠