Boto3 AWS Glue Crawler: how to get last run table changes?

0

In the AWS console, we can see if a crawler run returned with changes in schema for particular tables:

crawler ui

If I were to click on where it says "3 table changes", I would see the names of the tables that changed in that run.

Using boto3 or the API, how can I do that?

Using get_crawler_metrics only returns the number of affected tables, not the names of the tables.

EDIT:

After considering the accepted answer, I've solved my problem by recursively calling the GetTables API. Tables have a last updated timestamp. I used that to check which tables have been updated recently. This is enough for me, as I don't care what happened to the tables, just that they changed recently, because I do this check immediately after the crawler finished a run.

posta 2 mesi fa422 visualizzazioni
1 Risposta
1
Risposta accettata

I'd suggest using the AWS Glue Data Catalog APIs like GetTable or GetTables to programmatically retrieve the metadata of all tables in a database after a crawler run. By comparing the table metadata before and after a run, you can identify which tables were created, updated or deleted. Get Table Get Tables

The crawler logs and AWS Glue console are other options but may not be suitable. For ad-hoc exploration, the AWS Glue console provides the most direct way to see the names of tables affected by a specific crawler run. However, through code, the Data Catalog APIs provide a way to programmatically retrieve this information by comparing table metadata.

AWS
con risposta 2 mesi fa
profile picture
ESPERTO
verificato 2 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande