Accessing Glue Data Catalog from Spark program

0

Client creates EMR cluster as instructed here: http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html, along with a data catalog in Glue. Client then attempts to access table with code below but receives error, “ops.eventnote” table doesn’t exist. Confirmed table is in catalog. Is there a different way to specify Glue context?

public class TestAWSGlueCatalog {
                    private static SparkSession session;
                    private static SQLContext sqlContext;

                    public static void main(final String[] args) throws Exception {
                        try {
                            session = SparkSession.builder().appName("Operation Metrics Transformation")
                                    .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                                    .getOrCreate();
                            session.sparkContext().hadoopConfiguration()
                            .set("fs.s3a.access.key", "access-key");
                            session.sparkContext().hadoopConfiguration()
                            .set("fs.s3a.secret.key", "secret-key");
                            sqlContext = session.sqlContext();
                            final Dataset<Row> rows = sqlContext
                                    .sql("select * from ops.eventnote");
                             rows.show();
                        } catch (final Exception e) {
                            e.printStackTrace();
                            throw e;
                        }
                    }
feita há 7 anos3469 visualizações
1 Resposta
0
Resposta aceita

Make sure to enableHiveSupport and you can directly use SparkSession.sql to execute sql.

Python example is below. Works the same in Java or Scala.

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Test").enableHiveSupport().getOrCreate()
spark.sql("show tables").show()
AWS
respondido há 7 anos
profile picture
ESPECIALISTA
avaliado há 24 dias

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas