1 Answer
- Newest
- Most votes
- Most comments
0
Hi anselboro,
For the version you are using, additional configuration is required for EMR Serverless with Delta Lake according to the EMR Serverless Delta Lake documentation.
If you can provide which EMR version you are using, I can provide more specific configuration steps for that version.
For EMR 6.9.0 and higher (Delta Lake included):
--conf spark.jars=/usr/share/aws/delta/lib/delta-core.jar,/usr/share/aws/delta/lib/delta-storage.jar --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
For EMR 7.0.0 and higher (Delta Lake 3.0.0+):
--conf spark.jars=/usr/share/aws/delta/lib/delta-spark.jar,/usr/share/aws/delta/lib/delta-storage.jar --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
What You're Missing
Your current configuration:
--conf spark.jars=s3a://ls-dl-mvp-s3development/spark_jars/delta-core_2.12-2.1.0.jar,s3a://ls-dl-mvp-s3development/spark_jars/delta-storage-2.1.0.jar
Missing configurations:
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
Without these, Spark doesn't register the Delta catalog plugin, so USING DELTA is not recognized.
Thank you.
Relevant content
- asked 2 years ago
- AWS OFFICIALUpdated 3 years ago
