Accessing Glue Data Catalog from Spark program

0

Client creates EMR cluster as instructed here: http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html, along with a data catalog in Glue. Client then attempts to access table with code below but receives error, “ops.eventnote” table doesn’t exist. Confirmed table is in catalog. Is there a different way to specify Glue context?

public class TestAWSGlueCatalog {
                    private static SparkSession session;
                    private static SQLContext sqlContext;

                    public static void main(final String[] args) throws Exception {
                        try {
                            session = SparkSession.builder().appName("Operation Metrics Transformation")
                                    .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                                    .getOrCreate();
                            session.sparkContext().hadoopConfiguration()
                            .set("fs.s3a.access.key", "access-key");
                            session.sparkContext().hadoopConfiguration()
                            .set("fs.s3a.secret.key", "secret-key");
                            sqlContext = session.sqlContext();
                            final Dataset<Row> rows = sqlContext
                                    .sql("select * from ops.eventnote");
                             rows.show();
                        } catch (final Exception e) {
                            e.printStackTrace();
                            throw e;
                        }
                    }
posta 7 anni fa3469 visualizzazioni
1 Risposta
0
Risposta accettata

Make sure to enableHiveSupport and you can directly use SparkSession.sql to execute sql.

Python example is below. Works the same in Java or Scala.

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Test").enableHiveSupport().getOrCreate()
spark.sql("show tables").show()
AWS
con risposta 7 anni fa
profile picture
ESPERTO
verificato 24 giorni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande