- Newest
- Most votes
- Most comments
Hi Camilla11,
I too found the Redshift catalog aspects of external tables confusing. In part IMO this is due Redshift skipping a catalog declaration step when Spectrum was first created. In hindsight I think this may haunt them. However, in your case we can get past it.
The key concept to know is that Redshift Spectrum is really federating an external catalog, typically that catalog is the Glue Data Catalog, but it can also be a Hive Metastore Service running on an EC2 instance, an on-prem host you can reach over the network, or on the master node of an EMR cluster. Because it's a federated catalog most of the external catalog metadata comes from the external catalog. All Redshift is really putting into it's catalog is a pointer to where the federated catalog is via the external schema declaration. This is where the skipping that catalog declaration step coming into play making it confusing. Look carefully you external schema declaration and you see that the one of the key clauses in the statement points to where on the network to find the external catalog. This is somewhat opaque for the Glue Data Catalog because Redshift already knows where to look, but for a Hive Metastore Service it's connection information. You can also look directly at PG_CATALOG.SVV_EXTERNAL_SCHEMAS in the ESOPTIONS column for this same information. I think it's a little clearer now that that view has been extended to cover more than the Spectrum use case and now includes federating to RDS PostgreSQL, Aurora, and other Redshift databases.
So, to solve your original question. There are several ways you can get to the external table metadata from the external data catalog. One way is go to the data catalog itself directly. Chances are you're using the Glue Data Catalog for Redshift external tables and you go to the Glue console or CLI/API to get this table metadata. Another option is you can use the Athena console app to browse the table metadata in the Glue Data Catalog or Hive Metastore. Lastly you should be able to see the external table metadata directly in Redshift by querying PG_CATALOG.SVV_EXTERNAL_TABLES. Look at the PARAMETERS column of that view to see the table properties. The external catalog contains a "Create Time" table property that has the value you're looking for. AFAIK, there is not concept of a last modified time for an external table in the external catalog because by definition the data "in", or more accurately that the external table currently maps to, is managed outside of any database transaction control and thus the catalog does not have an opportunity to know when the data changes because there is no transnational control over the data.
I hope this helps you get to the metadata you need.
Regards,
-Kurt
Thank you!
That is what I thought. When I look at the parameters I see only 1 field, "transient_lastDdlTime" which I plan to use. Thanks!
Hi Camilla11,
The external table properties you get are indicative of what created the table. Different SQL engines or the service APIs will set different table properties. You should be able to pattern match the table properties to a set of likely external table creation engines.
Regards,
-Kurt
Hi Camilla11,
The external table properties you get are indicative of what created the table. Different SQL engines or the service APIs will set different table properties. You should be able to pattern match the table properties to a set of likely external table creation engines.
Regards,
-Kurt
Relevant content
- Accepted Answerasked 4 years ago
- asked 3 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago