Sagemaker Canvas Idle Metric Not Working

0

I was trying to follow the documentation provided [1] to DeleteApp when a Canvas session is idle for X amount of time. However, the lambda function was failing because it was unable to read the metric. After looking over the metric I found that it is not checking in and hasn't since the resource was initially created. Also, I thought it was odd its only data points were from 0 to -6 and then back up to 0 then just nothing. I have been looking through the Canvas documentation which only leads me to the solution provided below for this issue [1]. I have tried enabling Data Capture on the endpoint the Canvas is using to see if this would populate data points on the "TimeSinceLastActive" metric. The documentation makes me believe this information should be coming from the canvas associated endpoint but I am unable to get this to work after enabling all logging which is causing more cost on the account because the default idle time 2 hours can't be followed as this metric is not working properly. I have not seen other posts for this issue so I am wondering if any others may be experiencing this.

[1]. https://aws.amazon.com/blogs/machine-learning/optimizing-costs-for-amazon-sagemaker-canvas-with-automatic-shutdown-of-idle-apps/

asked 16 days ago340 views
1 Answer
0

Hi Ethan, Thanks for making the inquiry. Let's examine some potential causes, step-by-step.

First, let's make sure the metrics are published to CloudWatch. From the AWS Console, navigate to the CloudWatch service and select Metrics from the left-hand menu pane. Next, locate the metric titled /aws/sagemaker/Canvas/AppActivity. For any given Canvas profile you are monitoring, compare the TimeSinceLastActive against your expectation. If you do not see metrics for your profile, please log out of the Canvas app (or delete from the SageMaker Console) and then relaunch it. After coming back online, allow at least 30 minutes for the metric to be visible and repeat this process.

Second, please check permissions for your Lambda function. For the role associated with the Lambda, review the IAM policy. In the example yaml (provided for inspiration), the following code snippet shows the permissions the Lambda should have, at minimum.

- Effect: Allow
            Action:
              - 'logs:CreateLogGroup'
              - 'logs:CreateLogStream'
              - 'logs:PutLogEvents'
              - 'cloudwatch:GetMetricData'
            Resource: '*'  
          - Effect: Allow
            Action:
              - 'sagemaker:DeleteApp'

Can you please review these two items? If you are still blocked, please feel free to open a support case via AWS Console under Support Center so our engineers can deep dive with you. Thank you!

AWS
answered 15 days ago
  • Thank you for your response!

    I created a new user on the same domain and its Idle metric was able to check in with cloudwatch. I also was able to confirm that the lambda was able to delete the app from the new user created only. The original user appears to be bugged so we will just end up deleting it and using the new user.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions