Skip to content

Glue Crawler JDBC not deleting tables

0

I'm trying to use Glue to detect schema changes in an Aurora PostgresSQL database. I've set up a Glue crawler against a JDBC connection. I ran it and successfully populated the relational database tables into the Glue database.

In Postgres, I created a table named tmp_deleteme. I ran the crawler again, and it was added to the Glue database.

In Postgres, I dropped the table tmp_deleteme. I ran the crawler again and the crawler did not detect that the Postgres table was dropped. It remains in the Glue database. I've tried this several times with several temp tables and I can't get the crawler to recognize that the table was dropped.

The crawler is configured thusly:

        "RecrawlPolicy": {
            "RecrawlBehavior": "CRAWL_EVERYTHING"
        },
        "SchemaChangePolicy": {
            "UpdateBehavior": "UPDATE_IN_DATABASE",
            "DeleteBehavior": "DELETE_FROM_DATABASE"
        },
        "LineageConfiguration": {
            "CrawlerLineageSettings": "DISABLE"
        },

I would think the DeleteBehavior setting would cause the expected behavior. There are no errors in the crawler log or in the RDS database log.

Am I missing something?

1 Answer
-1
    It sounds like you're experiencing an issue with the AWS Glue crawler not deleting tables from the Glue Data Catalog when they've been dropped from your PostgreSQL database.

    Your configuration looks correct with the SchemaChangePolicy set to "DeleteBehavior": "DELETE_FROM_DATABASE", which should remove tables from the Glue Data Catalog when they no longer exist in the source database.

    Since the crawler isn't detecting the dropped tables, here are some potential issues to check:

    1. JDBC connection configuration: Ensure your JDBC URL is correctly configured to connect to the right database schema where the tables were dropped.

    2. IAM permissions: Verify that the IAM role used by the crawler has sufficient permissions to both read from the database and update/delete tables in the Glue Data Catalog.

    3. Network connectivity: Check that there are no networking issues preventing proper communication between AWS Glue and your Aurora PostgreSQL database.

    4. Database credentials and permissions: Ensure the database user credentials configured in your connection have the proper permissions to view schema information.

    If the crawler is successfully adding new tables but not removing deleted ones, you might need to manually delete the tables from your Glue Data Catalog using the AWS Glue API or SDK. You can use methods like `delete_table()` with the appropriate database name and table name parameters.

    As a workaround, you could try recreating the crawler or adjusting the include path to be more specific to the tables you want to track.
  

Sources
Troublehsoot AWS Glue connection to a JDBC data source | AWS re:Post
Use DeleteTable with an AWS SDK - AWS Glue
AWS Glue examples using SDK for Python (Boto3) - AWS SDK Code Examples

answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.