Hi there,
I'm trying to read an Aurora Postgres table with Spark on EMR. The Aurora Postgres table has been successfully crawled and the respective table in the Glue Data Catalog has been created. The EMR cluster has been configured with Glue Data Catalog for Spark and the configurations mentioned in our documentation.
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html
However, when I'm running queries against the table in Spark, I'm getting the following error.
scala> spark.sql("SELECT * FROM `aurora-glue`.`glue_public_distributors`")
18/09/11 14:17:40 WARN CredentialsLegacyConfigLocationProvider: Found the legacy config profiles file at [/home/hadoop/.aws/config]. Please move it to the latest default location [~/.aws/credentials].
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table glue_public_distributors. StorageDescriptor#InputFormat cannot be null for table: glue_public_distributors (Service: null; Status Code: 0; Error Code: null; Request ID: null);
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:808)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:385)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.isRunningDirectlyOnFiles(Analyzer.scala:682)
...
Any ideas what I am doing wrong here?