내용으로 건너뛰기

LastRuntimeSeconds of crawler's metric has a limitation!?

0

Hi everyone, I need a bash script which gives me the maximum runtime of each crawler. it seems that the 'LastRuntimeSeconds' just gives the latest run time.how can I have the max of run time for each crawler? I wanna use bash or python. not aws console (I usded aws glue get-crawler-metrics , with -query 'CrawlerMetricsList[*].LastRuntimeSeconds' , it just passed one runtime) thank you

질문됨 일 년 전229회 조회
1개 답변
0

To get the maximum runtime of each AWS Glue crawler over a period, you can use AWS CloudWatch metrics, as Glue logs metrics there for each run. By querying these metrics, you can find the maximum runtime for each crawler. Here’s how you can do this using a combination of aws CLI commands and a Python script.

Using AWS CloudWatch Metrics

1. List all Crawlers:

  • First, list all the crawlers in your AWS Glue.

2. Get CloudWatch Metrics:

  • For each crawler, query the CloudWatch metrics to get the maximum runtime over a specified period.

Step-by-Step Guide

Step 1: List All Crawlers

You can list all your AWS Glue crawlers using the aws glue list-crawlers command.

aws glue list-crawlers --query 'CrawlerNames' --output text

Step 2: Query CloudWatch Metrics

You can then use the CloudWatch get-metric-statistics command to query the Glue Crawler Metrics for the maximum runtime.

Example Python Script

Here’s a Python script that accomplishes this:

import boto3
from datetime import datetime, timedelta

def get_max_runtime(crawler_name, cloudwatch, start_time, end_time):
    response = cloudwatch.get_metric_statistics(
        Namespace='Glue',
        MetricName='CrawlerRunTime',
        Dimensions=[
            {
                'Name': 'CrawlerName',
                'Value': crawler_name
            }
        ],
        StartTime=start_time,
        EndTime=end_time,
        Period=86400,  # One day in seconds
        Statistics=['Maximum']
    )

    if 'Datapoints' in response and response['Datapoints']:
        return max(dp['Maximum'] for dp in response['Datapoints'])
    else:
        return None

def main():
    glue = boto3.client('glue')
    cloudwatch = boto3.client('cloudwatch')

    # Get the list of all crawlers
    crawlers = glue.list_crawlers()['CrawlerNames']

    # Define the time period for the metrics
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(days=30)  # Last 30 days

    crawler_max_runtimes = {}

    for crawler in crawlers:
        max_runtime = get_max_runtime(crawler, cloudwatch, start_time, end_time)
        crawler_max_runtimes[crawler] = max_runtime

    for crawler, runtime in crawler_max_runtimes.items():
        print(f"Crawler: {crawler}, Max Runtime: {runtime} seconds")

if __name__ == "__main__":
    main()

Explanation

1. AWS SDK Initialization:

  • Initialize the AWS Glue and CloudWatch clients using boto3.

2. Get Crawler Names:

  • List all crawlers using glue.list_crawlers().

3. Get Maximum Runtime for Each Crawler:

  • For each crawler, query the CloudWatch CrawlerRunTime metric.
  • Specify the StartTime and EndTime to define the period for which you want to get the metrics.
  • Use the Maximum statistic to get the maximum runtime.

4. Output:

  • Print the maximum runtime for each crawler.

Running the Script

Make sure you have the AWS CLI configured with the necessary permissions and boto3 installed. You can run the script in an environment where AWS CLI is configured:

pip install boto3
python script_name.py

This script will provide the maximum runtime for each Glue crawler over the last 30 days. You can adjust the start_time and end_time variables to modify the time range as needed.

전문가
답변함 일 년 전
전문가
검토됨 일 년 전
  • thank you , however 'Datapoints': [] . so it shows 'None seconds' in output for every crawler. do you have any opinion for this?

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

관련 콘텐츠