By using AWS re:Post, you agree to the AWS re:Post Terms of Use

How to create multiple glue job with different schedule in the same CDK stack

0

Hallo,

I have created below Glue job in CDK which i have scheduled and which is running fine.

But now i want to create another Glue job "pos table replication large" and want to schedule the job at 9am daily. Can i create the multiple glue job in the same CDK stack file ? I can use the same Crawler jdbc database connection but only the glue job needs to be created new with the same job properties as mentioned in the code and i need scheduled it.


//create crawler
const L2crawler = new GlueCrawler(this, 'pos_crawler', {
      name: 'pos',
      databaseName: 'pos', //glue_db.ref,
      targets: [
        new JdbcTarget(pos_gluedbconnection, 'pos/ROD'),
      ],
    });
    

 //  Glue job pos table 

    const L2gluejob = new GlueJob(this, 'pos_rep_job', {
      jobName: 'pos table replication',
      workerType: 'G.2X',
      numberOfWorkers: 20,
      timeout: Duration.hours(6),
      glueVersion: '4.0',
      connections: [pos_gluedbconnection],
      script: GlueScript.fromAsset(this, 'pos_table_replication', './pos Table Replication.py'),
      outputbucket: GlueBucketPath,
      outputbucketname: GlueBucket
    });


    // create glue workflow, triggers, and dependencies between glue resources

    //create glue workflow
    const glue_workflow = new aws_glue.CfnWorkflow(this, "glue-pos-workflow", {
      name: "pos-glue-workflow",
      description:
        "ETL workflow to pull from crawled db into target bucket",
    });

    //create triggers
  

    //rename assets and copy them to scripts folder
    const glue_trigger_crawler = new aws_glue.CfnTrigger(
      this,
      "glue_trigger_pos_crawler",
      {
        name: "Run-Crawler-" + 'pos',
        workflowName: glue_workflow.name,
        actions: [
          {
            crawlerName: 'pos',
            timeout: 120,
          },
        ],
        type: "SCHEDULED",
        schedule: 'cron(0 6,9 * * ? * )', //chron 
        startOnCreation: true,
      }
    );
    //add trigger dependency on workflow and job
    glue_trigger_crawler.node.addDependency(L2crawler);
    glue_trigger_crawler.addDependency(glue_workflow);

    //crawl parquet files located in S3 output-data folder
    const glue_trigger_L2glue_job = new aws_glue.CfnTrigger(
      this,
      "glue_trigger_pos_glue_job",
      {
        name: "Run-Job-" + 'L2pos table replication',
        workflowName: glue_workflow.name,
        actions: [
          {
            jobName: 'pos table replication',
          },
        ],
        predicate: {
          conditions: [
            {
              logicalOperator: "EQUALS",
              crawlerName: "pos",
              crawlState: "SUCCEEDED",
            },
          ],
        },
        type: "CONDITIONAL",
        startOnCreation: true,
      }
    );    
	
	
    glue_trigger_L2glue_job.node.addDependency(L2crawler);
    glue_trigger_L2glue_job.node.addDependency(L2gluejob);
    glue_trigger_L2glue_job.node.addDependency(GlueBucket);
1 Answer
2

Yes, you can create multiple Glue jobs in the same CDK stack file. To create another Glue job "pos table replication large" scheduled at 9am daily, you can follow these steps:

  1. Create a new Glue job similar to the existing one:
const L2gluejobLarge = new GlueJob(this, 'pos_rep_job_large', {
  jobName: 'pos table replication large',
  workerType: 'G.2X',
  numberOfWorkers: 20,
  timeout: Duration.hours(6),
  glueVersion: '4.0',
  connections: [pos_gluedbconnection],
  script: GlueScript.fromAsset(this, 'pos_table_replication_large', './pos Table Replication Large.py'),
  outputbucket: GlueBucketPath,
  outputbucketname: GlueBucket
});
  1. Create a new trigger for the new job:
const glue_trigger_L2glue_job_large = new aws_glue.CfnTrigger(
  this,
  "glue_trigger_pos_glue_job_large",
  {
    name: "Run-Job-pos-table-replication-large",
    workflowName: glue_workflow.name,
    actions: [
      {
        jobName: 'pos table replication large',
      },
    ],
    type: "SCHEDULED",
    schedule: 'cron(0 9 * * ? *)', // Run at 9am daily
    startOnCreation: true,
  }
);

glue_trigger_L2glue_job_large.node.addDependency(L2gluejobLarge);
glue_trigger_L2glue_job_large.node.addDependency(GlueBucket);

This approach allows you to create multiple Glue jobs in the same CDK stack, each with its own schedule. The new job will use the same crawler and JDBC database connection as the existing job, but it will have its own schedule to run at 9am daily. Make sure to adjust the script path and any other specific properties for the new job as needed.
Sources
@aws-cdk/aws-glue-alpha module · AWS CDK
class CfnTrigger (construct) · AWS CDK

profile picture
answered 12 days ago
profile picture
EXPERT
reviewed 12 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions