SageMaker Studio Jupyterlab 3.0 working poorly with SM Resources UI

3

Hi all,

Since Jupyterlab 3.0 was finally released on SM Studio, we have been super happy with it, however, for reasons unknown to us, the jupterlab interface works very poorly with SM resources, the following phenomenon have been observes:

  1. It takes FOREVER to load the page for SM pipelines, and half the time it reports error ("Error listing pipeline executions: Rate exceeded")
  2. Changing instance type and size for a notebook is now super laggy, and do not work half the time

Anyone knows if this is merely a lack of optimisation on the service team's part or is there something I can do to stop this behaviour? it's making our work very slow and unbearable, we don't want to revert back to 1.0 so any help would be greatly appreciated!

Best, RUoy

  • I receive the same error message but only when my pipeline is more complex and contains more steps (8). The ones with only three loads without any problem.

asked 2 years ago766 views
1 Answer
0

Thank you for sharing your observations.

In order to troubleshoot the issue, I would request you to delete the existing default Jupyter server app for the user profile and try creating a new app by launching the studio. Let us know if you are still facing the issue

Requesting you to confirm if it is same behavior with all the users of the studio or specific user profiles.

In order deep-dive further and investigate I would request you to create a support ticket with the AWS technical team with the following information -- Domain ID -- User-profile ARN -- Cloudwatch logs

and also the requesting you to share the list-pipelines[1] output from the CLI command.

Note:If you still have difficulties, I recommend to cut a support case and provide more detail about your account information and above requested details. Due to security reason, we cannot discuss account specific issue in the public posts.

Thank you. Reference:

[1] https://docs.aws.amazon.com/cli/latest/reference/sagemaker/list-pipelines.html

AWS
answered 2 years ago
  • Hi there, thanks a lot for your help! please find my responses below:

    1. I have tried deleting the Jupyterlab App multiple times and the same behaviour persists
    2. The same behaviour is observed across our team of 5 active users in the same domain 2.5 the same behaviour is observed even if we revert back to 1.0
    3. I have already created a support ticket under the category "system impaired", and I have not received a response yet, the ticket was created more than 36 hours ago, you can find the ticket ID here: Case ID 10300240931, I will make sure to include the requested information in the ticket log
    4. I will share the list-pipelines output in the ticket log
    5. an additional observation: if i created a carbon copy of the same pipelines, some of these copies are accessible (near instant load time with no failure of the pipeline execution page), but some still demonstrate the same behaviour
    6. Not sure if it's related, but we have a service quota of 100 GPU (g4dn.xlarge) instances for our DL workloads, however, we have not been able to use any of it, we keep getting throttled on the instance type. The timing coincided with all the above issues I've described, just a headsup.

    thank you so much!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions