Boto3 AWS Glue Crawler: how to get last run table changes?

0

In the AWS console, we can see if a crawler run returned with changes in schema for particular tables:

crawler ui

If I were to click on where it says "3 table changes", I would see the names of the tables that changed in that run.

Using boto3 or the API, how can I do that?

Using get_crawler_metrics only returns the number of affected tables, not the names of the tables.

EDIT:

After considering the accepted answer, I've solved my problem by recursively calling the GetTables API. Tables have a last updated timestamp. I used that to check which tables have been updated recently. This is enough for me, as I don't care what happened to the tables, just that they changed recently, because I do this check immediately after the crawler finished a run.

已提问 2 个月前422 查看次数
1 回答
1
已接受的回答

I'd suggest using the AWS Glue Data Catalog APIs like GetTable or GetTables to programmatically retrieve the metadata of all tables in a database after a crawler run. By comparing the table metadata before and after a run, you can identify which tables were created, updated or deleted. Get Table Get Tables

The crawler logs and AWS Glue console are other options but may not be suitable. For ad-hoc exploration, the AWS Glue console provides the most direct way to see the names of tables affected by a specific crawler run. However, through code, the Data Catalog APIs provide a way to programmatically retrieve this information by comparing table metadata.

AWS
已回答 2 个月前
profile picture
专家
已审核 2 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则