By using AWS re:Post, you agree to the AWS re:Post Terms of Use

How to enable the option in CDK where Crawler should not update the data catalog

0

Hallo,

I dont want to update the catalog when there is new column added or deleted in the jdbc source data table. When the crawler detects schema changes in the data store then it should Ignore the change and don't update the table in the data catalog. I wanted to ignore also when there is deleted objects in the data store.

I found below option in Crawler which i need to enable but dont know how to do it from CDK.

Enter image description here

Below is part of my CDK code :

const L2crawler = new GlueCrawler(this, 'pos_crawler', {
      name: 'POS',
      databaseName: 'pos', //glue_db.ref,
      targets: [
        new JdbcTarget(obs_gluedbconnection, 'POS/SSE'),        
      ],
    });


const glue_trigger_crawler = new aws_glue.CfnTrigger(
      this,
      "glue_trigger_pos_crawler",
      {
        name: "Run-Crawler-" + 'POS',
        workflowName: glue_workflow.name,
        actions: [
          {
            crawlerName: 'POS',
            timeout: 120,
          },
        ],
        type: "SCHEDULED",
        schedule: 'cron(45 6,9 * * ? * )', //chron meaning every day at 17:05 but aws account is 2 hours earlier
        startOnCreation: true,
      }
    );
    //add trigger dependency on workflow and job
    glue_trigger_crawler.node.addDependency(L2crawler);
    glue_trigger_crawler.addDependency(glue_workflow);

//crawl parquet files located in S3 output-data folder
const glue_trigger_L2glue_job = new aws_glue.CfnTrigger(
      this,
      "glue_trigger_pos_glue_job",
      {
        name: "Run-Job-" + 'L2POS table replication',
        workflowName: glue_workflow.name,
        actions: [
          {
            jobName: 'POS table replication',
          },
        ],
        predicate: {
          conditions: [
            {
              logicalOperator: "EQUALS",
              crawlerName: "POS",
              crawlState: "SUCCEEDED",
            },
          ],
        },
        type: "CONDITIONAL",
        startOnCreation: true,
      }
    );     
1 Answer
2

In your CDK code, you'll need to update the crawler configuration to include the schema change policy. You can do this by adding a schemaChangePolicy property to your crawler definition. Here's an example of how you can modify your code:

const L2crawler = new glue.CfnCrawler(this, 'POSCrawler', {
  name: 'POS',
  role: glueRole.roleArn,
  databaseName: 'your_database_name',
  targets: {
    jdbcTargets: [
      {
        // Your JDBC target configuration
      }
    ]
  },
  schemaChangePolicy: {
    updateBehavior: 'LOG',
    deleteBehavior: 'LOG'
  }
});

In this configuration:

  • updateBehavior: 'LOG' tells the crawler to ignore schema changes and not update the table in the data catalog. Instead, it will log the changes.
  • deleteBehavior: 'LOG' instructs the crawler to ignore deleted objects in the data store and log the deletions instead of updating the catalog.

By setting both updateBehavior and deleteBehavior to 'LOG', you're effectively telling the crawler to ignore changes and deletions, which matches your requirement to not update the catalog when there are new columns added or deleted in the JDBC source data table.

Remember to import the necessary modules and ensure that your CDK version supports this configuration. If you're using a different construct or a lower-level API, the property names might vary slightly, but the concept remains the same.

After making these changes, deploy your CDK stack, and the crawler will be configured to ignore schema changes and deletions in the data store.

Sources
Preventing a crawler from changing an existing schema - AWS Glue
Configure a crawler to handle schema changes | AWS re:Post
CatalogSchemaChangePolicy - AWS Glue

profile picture
answered 12 days ago
profile picture
EXPERT
reviewed 12 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions