スキップしてコンテンツを表示

LastRuntimeSeconds of crawler's metric has a limitation!?

0

Hi everyone, I need a bash script which gives me the maximum runtime of each crawler. it seems that the 'LastRuntimeSeconds' just gives the latest run time.how can I have the max of run time for each crawler? I wanna use bash or python. not aws console (I usded aws glue get-crawler-metrics , with -query 'CrawlerMetricsList[*].LastRuntimeSeconds' , it just passed one runtime) thank you

質問済み 1年前229ビュー
1回答
0

To get the maximum runtime of each AWS Glue crawler over a period, you can use AWS CloudWatch metrics, as Glue logs metrics there for each run. By querying these metrics, you can find the maximum runtime for each crawler. Here’s how you can do this using a combination of aws CLI commands and a Python script.

Using AWS CloudWatch Metrics

1. List all Crawlers:

  • First, list all the crawlers in your AWS Glue.

2. Get CloudWatch Metrics:

  • For each crawler, query the CloudWatch metrics to get the maximum runtime over a specified period.

Step-by-Step Guide

Step 1: List All Crawlers

You can list all your AWS Glue crawlers using the aws glue list-crawlers command.

aws glue list-crawlers --query 'CrawlerNames' --output text

Step 2: Query CloudWatch Metrics

You can then use the CloudWatch get-metric-statistics command to query the Glue Crawler Metrics for the maximum runtime.

Example Python Script

Here’s a Python script that accomplishes this:

import boto3
from datetime import datetime, timedelta

def get_max_runtime(crawler_name, cloudwatch, start_time, end_time):
    response = cloudwatch.get_metric_statistics(
        Namespace='Glue',
        MetricName='CrawlerRunTime',
        Dimensions=[
            {
                'Name': 'CrawlerName',
                'Value': crawler_name
            }
        ],
        StartTime=start_time,
        EndTime=end_time,
        Period=86400,  # One day in seconds
        Statistics=['Maximum']
    )

    if 'Datapoints' in response and response['Datapoints']:
        return max(dp['Maximum'] for dp in response['Datapoints'])
    else:
        return None

def main():
    glue = boto3.client('glue')
    cloudwatch = boto3.client('cloudwatch')

    # Get the list of all crawlers
    crawlers = glue.list_crawlers()['CrawlerNames']

    # Define the time period for the metrics
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(days=30)  # Last 30 days

    crawler_max_runtimes = {}

    for crawler in crawlers:
        max_runtime = get_max_runtime(crawler, cloudwatch, start_time, end_time)
        crawler_max_runtimes[crawler] = max_runtime

    for crawler, runtime in crawler_max_runtimes.items():
        print(f"Crawler: {crawler}, Max Runtime: {runtime} seconds")

if __name__ == "__main__":
    main()

Explanation

1. AWS SDK Initialization:

  • Initialize the AWS Glue and CloudWatch clients using boto3.

2. Get Crawler Names:

  • List all crawlers using glue.list_crawlers().

3. Get Maximum Runtime for Each Crawler:

  • For each crawler, query the CloudWatch CrawlerRunTime metric.
  • Specify the StartTime and EndTime to define the period for which you want to get the metrics.
  • Use the Maximum statistic to get the maximum runtime.

4. Output:

  • Print the maximum runtime for each crawler.

Running the Script

Make sure you have the AWS CLI configured with the necessary permissions and boto3 installed. You can run the script in an environment where AWS CLI is configured:

pip install boto3
python script_name.py

This script will provide the maximum runtime for each Glue crawler over the last 30 days. You can adjust the start_time and end_time variables to modify the time range as needed.

エキスパート
回答済み 1年前
エキスパート
レビュー済み 1年前
  • thank you , however 'Datapoints': [] . so it shows 'None seconds' in output for every crawler. do you have any opinion for this?

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

関連するコンテンツ