How do I check the resource utilization for my SageMaker AI notebook instance?

4 minute read
1

I started an Amazon SageMaker AI notebook instance to train models or load large datasets, and the notebook instance appears to be frozen. I can't view my SageMaker AI instance resource use.

Resolution

When your SageMaker browser or notebook instances appear unresponsive, run an Amazon Linux command or review Amazon CloudWatch metrics to view your resource allocation.

Run a Linux command to view SageMaker AI resource utilization

Complete the following steps:

  1. Open the SageMaker AI console.
  2. In the navigation pane, choose Notebook instances.
  3. Next to your SageMaker AI notebook instance, open Jupyter or JupyterLab.
  4. Open the terminal.
  5. Run the following commands to view your resource allocation.
    Available system memory and processor load:
    top
    Running tasks and processor load:
    ps -ax
    Disk space utilization and availability:
    df -h
    RAM utilization and availability:
    free -m

Use CloudWatch metrics to view SageMaker AI resource utilization

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.

Use a lifecycle script. For example, the publish-instance-metrics script publishes the system-level metrics from the notebook instance to CloudWatch. For more information, see publish-instance-metrics / on-start.sh on the GitHub website.

Note: To send instance metrics to CloudWatch, instances must assume an AWS Identity and Access Management (IAM) execution role. Add the cloudwatch:PutMetricData permission to the IAM policy that's attached to the execution role.

Example policy:

{   
   "Version": "2012-10-17",   
   "Statement": [   
     {   
            "Effect":   
            "Allow",   
            "Action": [   
                "cloudwatch:PutMetricData"   
            ],   
            "Resource": "*"   
            }   
     ]   
}

When you turn on CloudWatch Logs for the lifecycle configuration, use a SageMaker role with the following permissions:

{   
   "Version": "2012-10-17",   
   "Statement": [   
     {   
            "Effect":   
            "Allow",   
            "Action": [   
                "logs:CreateLogDelivery",   
                "logs:CreateLogGroup",    
                "logs:CreateLogStream",   
                "logs:DeleteLogDelivery",   
                "logs:Describe*",   
                "logs:GetLogDelivery",   
                "logs:GetLogEvents",   
                "logs:ListLogDeliveries",   
                "logs:PutLogEvents",   
                "logs:PutResourcePolicy",   
                "logs:UpdateLogDelivery"   
            ],   
            "Resource": "*"   
            }  
    ]   
}

Make sure that the notebook instance has internet connectivity to get the amazon-cloudwatch-agent.json configuration file so that the script doesn't fail. If internet access isn't available on the notebook, then manually download the .json file from GitHub to your local machine. Upload the file to an Amazon Simple Storage Service (Amazon S3) bucket, and then modify the bash code to copy the configuration file from the S3 bucket. In the on-start.sh LLC script, run the wget command to remove the line that uses the wget command. Then, add the s3 cp AWS CLI command to copy the .json file from the S3 bucket to a directory. It's a best practice to put the CloudWatch agent file in a directory, and then run the following command to start the agent:

``/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a \
append-config -m ec2 -c file://$(pwd)/amazon-cloudwatch-agent.json

Make sure that you create interface virtual private cloud (VPC) endpoints so that you can access other AWS services, such as Amazon S3 and CloudWatch.

Configure the SageMaker AI notebook to view CloudWatch metrics

Complete the following steps:

  1. Open the SageMaker AI console.

  2. In the navigation pane, choose Notebook instances.

  3. Next to your SageMaker notebook, open Jupyter or Jupyterlab.

  4. Open the terminal.

  5. Run the following command to open amazon-cloudwatch-agent-config-wizard:

    sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
  6. Follow the steps in the wizard, and then when prompted, complete the following steps:
    Choose On-premises host.
    For StatsD Daemon, choose no.
    For CollectD, choose no.

  7. Run the following command to start the CloudWatch agent on your server, and include the config.json file that the wizard created:

    sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:///opt/aws/amazon-cloudwatch-agent/bin/config.json -s
  8. Open the CloudWatch console.

  9. Choose Metrics, and then choose CWAgent to view your SageMaker AI metrics.

To view other example AWS lifecycle configuration scripts for SageMaker AI notebooks, see amazon-sagemaker-notebook-instance-lifecycle-config-samples on the GitHub website.

Related information

Metrics for monitoring Amazon SageMaker AI with Amazon CloudWatch

Metrics collected by the CloudWatch agent

Tools for monitoring the AWS resources provisioned while using Amazon SageMaker AI

Terminals on the JupyterLab website

AWS OFFICIAL
AWS OFFICIALUpdated 22 days ago