- Newest
- Most votes
- Most comments
For anybody having the same issue in the future, the reason was an old session to sagemaker was still "opened", so if you have any trace to the old session (browser tab, cli, etc), close it.
Thank you for reporting this issue with your SageMaker notebook instances. Based on the information you've provided, this doesn't appear to be a widely known or documented problem. However, I can offer some insights and suggestions based on how SageMaker notebook instances typically function.
When a notebook instance's status is 'InService', it should indeed be accessible. The fact that you're experiencing timeouts even after 15 minutes suggests there might be an underlying issue with the instance or its configuration after the upgrade process.
The behavior you're observing - where stopping the instance, changing the hardware configuration, and restarting resolves the problem - indicates that there might be some inconsistency in how the instance is handling the upgrade process.
Here are a few suggestions that might help:
-
Lifecycle Configuration: If you're using a lifecycle configuration script, ensure it's compatible with the new instance type after upgrading. These scripts have a 5-minute execution limit, so if they're hanging or taking too long, it could cause issues.
-
CloudWatch Logs: While you mentioned checking CloudWatch logs without seeing anomalies, it might be worth double-checking the logs in the "/aws/sagemaker/NotebookInstances" log group, specifically in the stream "[notebook-instance-name]/[LifecycleConfigHook]". This could provide more detailed information about what's happening during the instance startup.
-
IAM Permissions: Ensure that your IAM roles and policies are correctly set up and haven't been inadvertently modified, especially if they need to accommodate the new instance type.
-
AWS Support: If this issue persists and you're unable to resolve it through these methods, it would be advisable to contact AWS Support. They can provide more in-depth troubleshooting and potentially identify if this is a broader issue in the eu-west-3 region.
-
Consistent Reproduction: Since you can consistently reproduce this issue, it would be valuable information for AWS support if you decide to reach out to them.
While changing hardware specs back to the original configuration is a workaround, it's not an ideal long-term solution, especially considering the additional costs and time involved.
If this problem continues, documenting each occurrence with specific details (instance types, times, any error messages) will be helpful for troubleshooting and for discussions with AWS support.
Sources
Access Notebook Instances - Amazon SageMaker
StartNotebookInstance - Amazon SageMaker
NotebookInstanceLifecycleHook - Amazon SageMaker
CreateNotebookInstanceLifecycleConfig - Amazon SageMaker
Relevant content
- asked a year ago
- asked 2 months ago
- asked 7 months ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 2 years ago