- Más nuevo
- Más votos
- Más comentarios
Hello,
Assuming that you have built the jars as mentioned in the instructions https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore for your specific Spark version.
I was able to successfully connect to my Glue catalog tables by following the below steps
-
A Spark Docker image I have built and pushed to an ECR repo, following the instructions provided[1].
-
A new Spark Docker image I have built by including the Glue Hive catalog client jars mentioned on the GitHub page, on top of the previously I have created base Spark image. This patched image was also pushed to the ECR repo.
-
An EKS cluster was created, along with a namespace and service account specifically for Spark jobs.
-
I have downloaded spark on my computer and wrote a small pyspark script to read from my Glue table
-
Finally, I have used the below “spark-submit” command which ran successfully
spark-submit --master k8s://https://<Kubernetes url> --deploy-mode cluster --name spark-pi --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=<IMAGE_NAME> --conf spark.kubernetes.namespace=<NAMESPACE> --conf spark.kubernetes.executor.request.cores=1 --conf spark.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory --conf spark.hive.metastore.glue.catalogid=<AWS ACCOUNT ID> --conf spark.hive.imetastoreclient.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory --conf spark.kubernetes.file.upload.path=s3a://Bucket/ --conf spark.kubernetes.authenticate.driver.serviceAccountName=<SERVICE ACCOUNT NAME> script.py
Hope this information helps!
--Reference-- [1]https://spark.apache.org/docs/latest/running-on-kubernetes.html#:~:text=It%20can%20be%20found%20in,use%20with%20the%20Kubernetes%20backend
Contenido relevante
- OFICIAL DE AWSActualizada hace 2 años
- OFICIAL DE AWSActualizada hace un año
- OFICIAL DE AWSActualizada hace un año
- OFICIAL DE AWSActualizada hace 3 años