By using AWS re:Post, you agree to the AWS re:Post Terms of Use

Sagemaker notebook instances inaccesible

0

Hi,

Since last week, I've observed an issue where some notebook instances fail to boot, even though their status is displayed as 'InService'. After waiting over 15 minutes, I still can't access these instances—attempts to load Jupyter or JupyterLab consistently time out.

Stopping the instance, changing the hardware configuration, and restarting it resolves the problem, but this is time-consuming and results in additional costs, which feels unfair. Is this a known issue?

  • Reproduction Steps:
  1. Create a m.t3.medium notebook instance.
  2. Perform basic setup tasks (e.g., configure the environment).
  3. Stop the instance.
  4. Upgrade the instance type (e.g., to a g4 instance).
  5. Start the instance.
  6. The instance appears as 'InService', but neither Jupyter nor JupyterLab loads. The request times out.
  • Observations:
  1. Stopping and starting the instance again without modifying the specs works fine, it's only when we upgrade the specs that this behavior shows.
  2. Changing hardware specs to the original configuration (or higher tier) also resolves it.
  3. I'm stuck at the following screen:
We are redirecting you to your notebook for instance "NAME-HERE" now...  

I've checked CloudWatch logs, but they don't show any anomalies or errors, the instance seems to be up and ready to be used. I currently have no way to trace or resolve the issue.

Is this a known problem?

Region: eu-west-3

Thank you.

asked 21 days ago52 views
2 Answers
1
Accepted Answer

For anybody having the same issue in the future, the reason was an old session to sagemaker was still "opened", so if you have any trace to the old session (browser tab, cli, etc), close it.

answered 21 days ago
profile picture
EXPERT
reviewed 19 days ago
profile picture
EXPERT
reviewed 21 days ago
1

Thank you for reporting this issue with your SageMaker notebook instances. Based on the information you've provided, this doesn't appear to be a widely known or documented problem. However, I can offer some insights and suggestions based on how SageMaker notebook instances typically function.

When a notebook instance's status is 'InService', it should indeed be accessible. The fact that you're experiencing timeouts even after 15 minutes suggests there might be an underlying issue with the instance or its configuration after the upgrade process.

The behavior you're observing - where stopping the instance, changing the hardware configuration, and restarting resolves the problem - indicates that there might be some inconsistency in how the instance is handling the upgrade process.

Here are a few suggestions that might help:

  1. Lifecycle Configuration: If you're using a lifecycle configuration script, ensure it's compatible with the new instance type after upgrading. These scripts have a 5-minute execution limit, so if they're hanging or taking too long, it could cause issues.

  2. CloudWatch Logs: While you mentioned checking CloudWatch logs without seeing anomalies, it might be worth double-checking the logs in the "/aws/sagemaker/NotebookInstances" log group, specifically in the stream "[notebook-instance-name]/[LifecycleConfigHook]". This could provide more detailed information about what's happening during the instance startup.

  3. IAM Permissions: Ensure that your IAM roles and policies are correctly set up and haven't been inadvertently modified, especially if they need to accommodate the new instance type.

  4. AWS Support: If this issue persists and you're unable to resolve it through these methods, it would be advisable to contact AWS Support. They can provide more in-depth troubleshooting and potentially identify if this is a broader issue in the eu-west-3 region.

  5. Consistent Reproduction: Since you can consistently reproduce this issue, it would be valuable information for AWS support if you decide to reach out to them.

While changing hardware specs back to the original configuration is a workaround, it's not an ideal long-term solution, especially considering the additional costs and time involved.

If this problem continues, documenting each occurrence with specific details (instance types, times, any error messages) will be helpful for troubleshooting and for discussions with AWS support.
Sources
Access Notebook Instances - Amazon SageMaker
StartNotebookInstance - Amazon SageMaker
NotebookInstanceLifecycleHook - Amazon SageMaker
CreateNotebookInstanceLifecycleConfig - Amazon SageMaker

profile picture
answered 21 days ago
profile picture
EXPERT
reviewed 21 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions