- 新しい順
- 投票が多い順
- コメントが多い順
To get the maximum runtime of each AWS Glue crawler over a period, you can use AWS CloudWatch metrics, as Glue logs metrics there for each run. By querying these metrics, you can find the maximum runtime for each crawler. Here’s how you can do this using a combination of aws CLI commands and a Python script.
Using AWS CloudWatch Metrics
1. List all Crawlers:
- First, list all the crawlers in your AWS Glue.
2. Get CloudWatch Metrics:
- For each crawler, query the CloudWatch metrics to get the maximum runtime over a specified period.
Step-by-Step Guide
Step 1: List All Crawlers
You can list all your AWS Glue crawlers using the aws glue list-crawlers command.
aws glue list-crawlers --query 'CrawlerNames' --output text
Step 2: Query CloudWatch Metrics
You can then use the CloudWatch get-metric-statistics command to query the Glue Crawler Metrics for the maximum runtime.
Example Python Script
Here’s a Python script that accomplishes this:
import boto3
from datetime import datetime, timedelta
def get_max_runtime(crawler_name, cloudwatch, start_time, end_time):
response = cloudwatch.get_metric_statistics(
Namespace='Glue',
MetricName='CrawlerRunTime',
Dimensions=[
{
'Name': 'CrawlerName',
'Value': crawler_name
}
],
StartTime=start_time,
EndTime=end_time,
Period=86400, # One day in seconds
Statistics=['Maximum']
)
if 'Datapoints' in response and response['Datapoints']:
return max(dp['Maximum'] for dp in response['Datapoints'])
else:
return None
def main():
glue = boto3.client('glue')
cloudwatch = boto3.client('cloudwatch')
# Get the list of all crawlers
crawlers = glue.list_crawlers()['CrawlerNames']
# Define the time period for the metrics
end_time = datetime.utcnow()
start_time = end_time - timedelta(days=30) # Last 30 days
crawler_max_runtimes = {}
for crawler in crawlers:
max_runtime = get_max_runtime(crawler, cloudwatch, start_time, end_time)
crawler_max_runtimes[crawler] = max_runtime
for crawler, runtime in crawler_max_runtimes.items():
print(f"Crawler: {crawler}, Max Runtime: {runtime} seconds")
if __name__ == "__main__":
main()
Explanation
1. AWS SDK Initialization:
- Initialize the AWS Glue and CloudWatch clients using boto3.
2. Get Crawler Names:
- List all crawlers using glue.list_crawlers().
3. Get Maximum Runtime for Each Crawler:
- For each crawler, query the CloudWatch CrawlerRunTime metric.
- Specify the StartTime and EndTime to define the period for which you want to get the metrics.
- Use the Maximum statistic to get the maximum runtime.
4. Output:
- Print the maximum runtime for each crawler.
Running the Script
Make sure you have the AWS CLI configured with the necessary permissions and boto3 installed. You can run the script in an environment where AWS CLI is configured:
pip install boto3
python script_name.py
This script will provide the maximum runtime for each Glue crawler over the last 30 days. You can adjust the start_time and end_time variables to modify the time range as needed.

thank you , however 'Datapoints': [] . so it shows 'None seconds' in output for every crawler. do you have any opinion for this?