Client creates EMR cluster as instructed here: http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html, along with a data catalog in Glue. Client then attempts to access table with code below but receives error, “ops.eventnote” table doesn’t exist. Confirmed table is in catalog. Is there a different way to specify Glue context?
public class TestAWSGlueCatalog {
private static SparkSession session;
private static SQLContext sqlContext;
public static void main(final String[] args) throws Exception {
try {
session = SparkSession.builder().appName("Operation Metrics Transformation")
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.getOrCreate();
session.sparkContext().hadoopConfiguration()
.set("fs.s3a.access.key", "access-key");
session.sparkContext().hadoopConfiguration()
.set("fs.s3a.secret.key", "secret-key");
sqlContext = session.sqlContext();
final Dataset<Row> rows = sqlContext
.sql("select * from ops.eventnote");
rows.show();
} catch (final Exception e) {
e.printStackTrace();
throw e;
}
}