スキップしてコンテンツを表示

How to create multiple glue job with different schedule in the same CDK stack

0

Hallo,

I have created below Glue job in CDK which i have scheduled and which is running fine.

But now i want to create another Glue job "pos table replication large" and want to schedule the job at 9am daily. Can i create the multiple glue job in the same CDK stack file ? I can use the same Crawler jdbc database connection but only the glue job needs to be created new with the same job properties as mentioned in the code and i need scheduled it.


//create crawler
const L2crawler = new GlueCrawler(this, 'pos_crawler', {
      name: 'pos',
      databaseName: 'pos', //glue_db.ref,
      targets: [
        new JdbcTarget(pos_gluedbconnection, 'pos/ROD'),
      ],
    });
    

 //  Glue job pos table 

    const L2gluejob = new GlueJob(this, 'pos_rep_job', {
      jobName: 'pos table replication',
      workerType: 'G.2X',
      numberOfWorkers: 20,
      timeout: Duration.hours(6),
      glueVersion: '4.0',
      connections: [pos_gluedbconnection],
      script: GlueScript.fromAsset(this, 'pos_table_replication', './pos Table Replication.py'),
      outputbucket: GlueBucketPath,
      outputbucketname: GlueBucket
    });


    // create glue workflow, triggers, and dependencies between glue resources

    //create glue workflow
    const glue_workflow = new aws_glue.CfnWorkflow(this, "glue-pos-workflow", {
      name: "pos-glue-workflow",
      description:
        "ETL workflow to pull from crawled db into target bucket",
    });

    //create triggers
  

    //rename assets and copy them to scripts folder
    const glue_trigger_crawler = new aws_glue.CfnTrigger(
      this,
      "glue_trigger_pos_crawler",
      {
        name: "Run-Crawler-" + 'pos',
        workflowName: glue_workflow.name,
        actions: [
          {
            crawlerName: 'pos',
            timeout: 120,
          },
        ],
        type: "SCHEDULED",
        schedule: 'cron(0 6,9 * * ? * )', //chron 
        startOnCreation: true,
      }
    );
    //add trigger dependency on workflow and job
    glue_trigger_crawler.node.addDependency(L2crawler);
    glue_trigger_crawler.addDependency(glue_workflow);

    //crawl parquet files located in S3 output-data folder
    const glue_trigger_L2glue_job = new aws_glue.CfnTrigger(
      this,
      "glue_trigger_pos_glue_job",
      {
        name: "Run-Job-" + 'L2pos table replication',
        workflowName: glue_workflow.name,
        actions: [
          {
            jobName: 'pos table replication',
          },
        ],
        predicate: {
          conditions: [
            {
              logicalOperator: "EQUALS",
              crawlerName: "pos",
              crawlState: "SUCCEEDED",
            },
          ],
        },
        type: "CONDITIONAL",
        startOnCreation: true,
      }
    );    
	
	
    glue_trigger_L2glue_job.node.addDependency(L2crawler);
    glue_trigger_L2glue_job.node.addDependency(L2gluejob);
    glue_trigger_L2glue_job.node.addDependency(GlueBucket);
質問済み 1年前302ビュー
1回答
2

Yes, you can create multiple Glue jobs in the same CDK stack file. To create another Glue job "pos table replication large" scheduled at 9am daily, you can follow these steps:

  1. Create a new Glue job similar to the existing one:
const L2gluejobLarge = new GlueJob(this, 'pos_rep_job_large', {
  jobName: 'pos table replication large',
  workerType: 'G.2X',
  numberOfWorkers: 20,
  timeout: Duration.hours(6),
  glueVersion: '4.0',
  connections: [pos_gluedbconnection],
  script: GlueScript.fromAsset(this, 'pos_table_replication_large', './pos Table Replication Large.py'),
  outputbucket: GlueBucketPath,
  outputbucketname: GlueBucket
});
  1. Create a new trigger for the new job:
const glue_trigger_L2glue_job_large = new aws_glue.CfnTrigger(
  this,
  "glue_trigger_pos_glue_job_large",
  {
    name: "Run-Job-pos-table-replication-large",
    workflowName: glue_workflow.name,
    actions: [
      {
        jobName: 'pos table replication large',
      },
    ],
    type: "SCHEDULED",
    schedule: 'cron(0 9 * * ? *)', // Run at 9am daily
    startOnCreation: true,
  }
);

glue_trigger_L2glue_job_large.node.addDependency(L2gluejobLarge);
glue_trigger_L2glue_job_large.node.addDependency(GlueBucket);

This approach allows you to create multiple Glue jobs in the same CDK stack, each with its own schedule. The new job will use the same crawler and JDBC database connection as the existing job, but it will have its own schedule to run at 9am daily. Make sure to adjust the script path and any other specific properties for the new job as needed.
Sources
@aws-cdk/aws-glue-alpha module · AWS CDK
class CfnTrigger (construct) · AWS CDK

回答済み 1年前
エキスパート
レビュー済み 1年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

関連するコンテンツ