Python script for Cost Estimate for AWS MSK Serverless

0

I want to find the cost estimate for MSK serverless for the current workload i have in my production environment. I cant find the MSK Serverless option in aws calculator, but in the aws msk pricing page Pricing Dimension Unit Price per unit Cluster-hours per hour, Partition-hours per hour, Storage per GiB-month, Data In per GiB, Data Out per GiB

above data are needed. How to use python script to calculate above five parameters for my current workload with the data available in CloudWatch metrices in the production account

1 Answer
0
Accepted Answer

To estimate the cost for MSK Serverless based on your current workload using CloudWatch metrics, you can use the following Python script. This script will retrieve the necessary metrics from CloudWatch and calculate the cost based on the pricing dimensions provided.

Prerequisites

  1. AWS SDK for Python (Boto3): Ensure you have Boto3 installed.
  2. AWS Credentials: Configure your AWS credentials.

Script

import boto3
from datetime import datetime, timedelta

# Initialize CloudWatch client
cloudwatch = boto3.client('cloudwatch', region_name='your-region')

# Define the time range for metrics collection
end_time = datetime.utcnow()
start_time = end_time - timedelta(days=30)

# Function to get CloudWatch metrics
def get_metric_statistics(namespace, metric_name, dimensions, start_time, end_time, period=3600, statistics=['Sum']):
    response = cloudwatch.get_metric_statistics(
        Namespace=namespace,
        MetricName=metric_name,
        Dimensions=dimensions,
        StartTime=start_time,
        EndTime=end_time,
        Period=period,
        Statistics=statistics
    )
    return response['Datapoints']

# Get Cluster-hours
cluster_hours = get_metric_statistics(
    namespace='AWS/Kafka',
    metric_name='ClusterHours',
    dimensions=[{'Name': 'ClusterName', 'Value': 'your-cluster-name'}],
    start_time=start_time,
    end_time=end_time
)

# Get Partition-hours
partition_hours = get_metric_statistics(
    namespace='AWS/Kafka',
    metric_name='PartitionCount',
    dimensions=[{'Name': 'ClusterName', 'Value': 'your-cluster-name'}],
    start_time=start_time,
    end_time=end_time
)

# Get Storage (in GiB-month)
storage = get_metric_statistics(
    namespace='AWS/Kafka',
    metric_name='StorageBytes',
    dimensions=[{'Name': 'ClusterName', 'Value': 'your-cluster-name'}],
    start_time=start_time,
    end_time=end_time
)

# Convert storage bytes to GiB-month
storage_gib_month = sum([point['Sum'] for point in storage]) / (1024**3 * 30 * 24 * 3600)

# Get Data In (in GiB)
data_in = get_metric_statistics(
    namespace='AWS/Kafka',
    metric_name='BytesInPerSec',
    dimensions=[{'Name': 'ClusterName', 'Value': 'your-cluster-name'}],
    start_time=start_time,
    end_time=end_time
)

# Convert bytes to GiB
data_in_gib = sum([point['Sum'] for point in data_in]) / (1024**3)

# Get Data Out (in GiB)
data_out = get_metric_statistics(
    namespace='AWS/Kafka',
    metric_name='BytesOutPerSec',
    dimensions=[{'Name': 'ClusterName', 'Value': 'your-cluster-name'}],
    start_time=start_time,
    end_time=end_time
)

# Convert bytes to GiB
data_out_gib = sum([point['Sum'] for point in data_out]) / (1024**3)

# Pricing (example values, replace with actual prices)
price_per_cluster_hour = 0.10  # in USD
price_per_partition_hour = 0.01  # in USD
price_per_gib_month_storage = 0.10  # in USD
price_per_gib_data_in = 0.02  # in USD
price_per_gib_data_out = 0.05  # in USD

# Calculate costs
cluster_hours_cost = sum([point['Sum'] for point in cluster_hours]) * price_per_cluster_hour
partition_hours_cost = sum([point['Sum'] for point in partition_hours]) * price_per_partition_hour
storage_cost = storage_gib_month * price_per_gib_month_storage
data_in_cost = data_in_gib * price_per_gib_data_in
data_out_cost = data_out_gib * price_per_gib_data_out

# Total cost
total_cost = cluster_hours_cost + partition_hours_cost + storage_cost + data_in_cost + data_out_cost

print(f"Cluster Hours Cost: ${cluster_hours_cost:.2f}")
print(f"Partition Hours Cost: ${partition_hours_cost:.2f}")
print(f"Storage Cost: ${storage_cost:.2f}")
print(f"Data In Cost: ${data_in_cost:.2f}")
print(f"Data Out Cost: ${data_out_cost:.2f}")
print(f"Total Cost: ${total_cost:.2f}")

Explanation

  1. Retrieve Metrics: The script uses the get_metric_statistics function to retrieve relevant metrics from CloudWatch.
  2. Convert and Sum Metrics: The retrieved metrics are processed to convert them into the required units (GiB, hours).
  3. Cost Calculation: The script calculates the cost based on the predefined pricing for each dimension.
  4. Output: The script prints out the individual costs and the total cost for running MSK Serverless.

Replace placeholders such as your-region and your-cluster-name with your actual AWS region and MSK cluster name. Adjust the pricing values to reflect the current prices from the AWS MSK pricing page.

This script should help you estimate the cost of running MSK Serverless based on your production workload.

profile picture
EXPERT
answered 3 months ago
  • I tried using above script, but it is giving zero for all, i tried changing the clusters and regions, for all it is giving zero, is it like we need to access in broker level

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions