The following script does not create the table in the S3 location indicated by the query.
I tested it locally and the Delta Json file is created and contains the information about the created table.
from pyspark.sql import SparkSession
spark = (SparkSession
.builder
.enableHiveSupport()
.appName('omop_ddl')
.getOrCreate()
)
spark.sql(f"""
CREATE
OR REPLACE TABLE CONCEPT (
CONCEPT_ID LONG,
CONCEPT_NAME STRING,
DOMAIN_ID STRING,
VOCABULARY_ID STRING,
CONCEPT_CLASS_ID STRING,
STANDARD_CONCEPT STRING,
CONCEPT_CODE STRING,
VALID_START_DATE DATE,
VALID_END_DATE DATE,
INVALID_REASON STRING
) USING DELTA
LOCATION 's3a://ls-dl-mvp-s3deltalake/health_lakehouse/silver/concept';
""")
The configuration parameters are the following ones:
--conf spark.jars=s3a://ls-dl-mvp-s3development/spark_jars/delta-core_2.12-2.1.0.jar,s3a://ls-dl-mvp-s3development/spark_jars/delta-storage-2.1.0.jar
--conf spark.executor.cores=1
--conf spark.executor.memory=4g
--conf spark.driver.cores=1
--conf spark.driver.memory=4g
--conf spark.executor.instances=1
I tried to modify the location in the query by inserting a non-existent bucket and the script did not go into error. Am I forgetting something?
Thank you very much for your help