Accessing Glue Data Catalog from Spark program

0

Client creates EMR cluster as instructed here: http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html, along with a data catalog in Glue. Client then attempts to access table with code below but receives error, “ops.eventnote” table doesn’t exist. Confirmed table is in catalog. Is there a different way to specify Glue context?

public class TestAWSGlueCatalog {
                    private static SparkSession session;
                    private static SQLContext sqlContext;

                    public static void main(final String[] args) throws Exception {
                        try {
                            session = SparkSession.builder().appName("Operation Metrics Transformation")
                                    .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                                    .getOrCreate();
                            session.sparkContext().hadoopConfiguration()
                            .set("fs.s3a.access.key", "access-key");
                            session.sparkContext().hadoopConfiguration()
                            .set("fs.s3a.secret.key", "secret-key");
                            sqlContext = session.sqlContext();
                            final Dataset<Row> rows = sqlContext
                                    .sql("select * from ops.eventnote");
                             rows.show();
                        } catch (final Exception e) {
                            e.printStackTrace();
                            throw e;
                        }
                    }
preguntada hace 7 años3469 visualizaciones
1 Respuesta
0
Respuesta aceptada

Make sure to enableHiveSupport and you can directly use SparkSession.sql to execute sql.

Python example is below. Works the same in Java or Scala.

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Test").enableHiveSupport().getOrCreate()
spark.sql("show tables").show()
AWS
respondido hace 7 años
profile picture
EXPERTO
revisado hace 24 días

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas