Accessing Glue Data Catalog from Spark program

0

Client creates EMR cluster as instructed here: http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html, along with a data catalog in Glue. Client then attempts to access table with code below but receives error, “ops.eventnote” table doesn’t exist. Confirmed table is in catalog. Is there a different way to specify Glue context?

public class TestAWSGlueCatalog {
                    private static SparkSession session;
                    private static SQLContext sqlContext;

                    public static void main(final String[] args) throws Exception {
                        try {
                            session = SparkSession.builder().appName("Operation Metrics Transformation")
                                    .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                                    .getOrCreate();
                            session.sparkContext().hadoopConfiguration()
                            .set("fs.s3a.access.key", "access-key");
                            session.sparkContext().hadoopConfiguration()
                            .set("fs.s3a.secret.key", "secret-key");
                            sqlContext = session.sqlContext();
                            final Dataset<Row> rows = sqlContext
                                    .sql("select * from ops.eventnote");
                             rows.show();
                        } catch (final Exception e) {
                            e.printStackTrace();
                            throw e;
                        }
                    }
已提問 7 年前檢視次數 3467 次
1 個回答
0
已接受的答案

Make sure to enableHiveSupport and you can directly use SparkSession.sql to execute sql.

Python example is below. Works the same in Java or Scala.

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Test").enableHiveSupport().getOrCreate()
spark.sql("show tables").show()
AWS
已回答 7 年前
profile picture
專家
已審閱 23 天前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南