Reading Aurora Postgress Table with Spark SQL on EMR

0

Hi there,

I'm trying to read an Aurora Postgres table with Spark on EMR. The Aurora Postgres table has been successfully crawled and the respective table in the Glue Data Catalog has been created. The EMR cluster has been configured with Glue Data Catalog for Spark and the configurations mentioned in our documentation.

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html

However, when I'm running queries against the table in Spark, I'm getting the following error.

scala> spark.sql("SELECT * FROM `aurora-glue`.`glue_public_distributors`")
18/09/11 14:17:40 WARN CredentialsLegacyConfigLocationProvider: Found the legacy config profiles file at [/home/hadoop/.aws/config]. Please move it to the latest default location [~/.aws/credentials].
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table glue_public_distributors. StorageDescriptor#InputFormat cannot be null for table: glue_public_distributors (Service: null; Status Code: 0; Error Code: null; Request ID: null);
  at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
  at org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:808)
  at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:385)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.isRunningDirectlyOnFiles(Analyzer.scala:682)
...

Any ideas what I am doing wrong here?

已提問 6 年前檢視次數 1755 次
1 個回答
0
已接受的答案

You cannot run Spark SQL on Aurora Postgres using the Glue Catalog as a metadata store. Just like Athena, Spark SQL can only use the Glue Catalog for tables in S3. To run Spark SQL on Aurora Postgres you would need to use Spark JDBC packages to directly query the database.

properties = {
    ...
    "driver": "org.postgresql.Driver"
}
jdbc_url='<jdbc-url>'
df=spark.read.jdbc(url=jdbc_url, table='<tablename>', properties=properties)
AWS
已回答 6 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南