Accessing Glue Data Catalog from Spark program

0

Client creates EMR cluster as instructed here: http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html, along with a data catalog in Glue. Client then attempts to access table with code below but receives error, “ops.eventnote” table doesn’t exist. Confirmed table is in catalog. Is there a different way to specify Glue context?

public class TestAWSGlueCatalog {
                    private static SparkSession session;
                    private static SQLContext sqlContext;

                    public static void main(final String[] args) throws Exception {
                        try {
                            session = SparkSession.builder().appName("Operation Metrics Transformation")
                                    .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                                    .getOrCreate();
                            session.sparkContext().hadoopConfiguration()
                            .set("fs.s3a.access.key", "access-key");
                            session.sparkContext().hadoopConfiguration()
                            .set("fs.s3a.secret.key", "secret-key");
                            sqlContext = session.sqlContext();
                            final Dataset<Row> rows = sqlContext
                                    .sql("select * from ops.eventnote");
                             rows.show();
                        } catch (final Exception e) {
                            e.printStackTrace();
                            throw e;
                        }
                    }
demandé il y a 7 ans3467 vues
1 réponse
0
Réponse acceptée

Make sure to enableHiveSupport and you can directly use SparkSession.sql to execute sql.

Python example is below. Works the same in Java or Scala.

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Test").enableHiveSupport().getOrCreate()
spark.sql("show tables").show()
AWS
répondu il y a 7 ans
profile picture
EXPERT
vérifié il y a 23 jours

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions